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FOLYPEFTIDES BAVZNO A FUHCTZOMAL DOMAZN 07 ZNTERBST 
MID METHOD8 OP IDEMTIPYIMQ AND OfllMQ SMIB 

This application is a continuation-*in-'part of co- 
5 pending U.S. Patent Application Serial No, 08/417,872 filed 
April 7, 1995, the entire contents of which are incorporated 
herein by reference. 

1. intrg^jitiff^ipn 

10 The present invention is directed to polypeptides 

having a functional domain of interest or functional 
equivalents thereof. Methods of identifying these 
polypeptides are described, along with various methods of 
their use, including but not limited to targeted drug 

15 discovery. 

2. Background of the Invention 

Combinatorial libraries represent exciting new tools 
in basic science research and drug design. It is possible 

20 through synthetic chemistry or molecular biology to generate 
libraries of complex polymers, with many subunit permutations. 
There are many guises to these libraries: random peptides, 
which can be synthesized on plastic pins (Geysen et al., 1987, 
J. Immunol. Heth. 102:259-274), beads (Lam et al., 1991, 

2S Nature 354:82-84) or in a soluble form (Houghten et al., 1991, 
Nature 354:84-86) or expressed on the surface of viral 
particles (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 
87:6378-6382; Kay et al., 1993, Gene 128:59-65; Scott and 
Smith, 1990, Science 249:386-390); nucleic acids (Ellington 

30 and Szostak, 1990, Nature 346:818-822; Gao et al., 1994, Proc. 
Natl. Acad. Sci. USA 91:11207-11211; Tuerk and Gold, 1990, 
Science 249:505-510); and small organic molecules (Gordon et 
al., 1994, J. Med. Chem. 37:1385-1401). These libraries are 
very useful in mapping protein-protein interactions and 

35 discovering drugs. 

Phage display has become a powerful method for 
screening populations of peptides, mutagenized proteins, and 
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cDNAs for members that have affinity to target molecules of 
interest. It is possible to generate 10»-10' different 
recombinants from which one or more clones can be selected 
with affinity to antigens, antibodies, cell surface receptors, 
5 protein chaperones, DNA, metal ions, etc. Screening libraries 
is versatile because the displayed elements are expressed on 
the surface of the virus as capsid-fusion proteins. The most 
important consequence of this arrangement is that there is a 
physical linkage between phenotype and genotype. There are 
10 several other advantages as well: 1) virus particles which 
have been isolated from libraries by affinity selection can be 
regenerated by simple bacterial infection, and 2) the primary 
structure of the displayed binding peptide or protein can be 
easily deduced by DNA sequencing of the cloned segment in the 

15 viral genome. 

Combinatorial peptide libraries have been expressed 
in bacteriophage. Synthetic oligonucleotides, fixed in 
length, but with multiple unspecified codons can be cloned 
into genes III, VI, or VIII of bacteriophage M13 where they 
20 are expressed as a plurality of peptide: caps id fusion 

proteins. The libraries, often referred to as random peptide 
libraries, can be screened for binding to target molecules of 
interest. Usually, three to four rounds of screening can be 
accomplished in a week's time, leading to the isolation of one 
25 to hundreds of binding phage. 

The primary structure of the binding peptides is 
then deduced by nucleotide sequencing of individual clones, 
inspection of the peptide sequences sometimes reveals a common 
motif, or consensus sequence. Generally, this motif when 
30 synthesized as a soluble peptide has the full binding 

activity. Random peptide libraries have successfully yielded 
peptides that bind to the Fab site of antibodies (Cwirla et 
al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Scott and 
smith, 1990, science 249:386-390), cell surface receptors 
35 (Doorbar and Winter, 1994, J. Mol.' Biol. 244:361-369; Goodson 
et al., 1994, Proc. Natl. Acad. Sci. USA 91:7129-7133), 
cytosolic receptors (Blond-Elguindi et al., 1993, Cell 75:717- 
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728), intracellular proteins (Daniels and Lane, 1994, J. Hoi. 
Biol. 243:639-652; Dedman et al., 1993, J. Biol. Chem. 
268:23025-23030; Sparks et al. , 1994, J. Biol. Chem. 
269:23853-23856), DNA (Krook et al., 1994, Biochem. Biophys. 
5 Res. Conin. 204:849-854), and many other targets (Winter, 1994, 
Drug Dev. Res. 33:71-89). 

Most vital cellular processes are regulated by the 
transmission of signals throughout the cell in the form of 
complex interactions between proteins. As the study of signal 

10 transduction, or the flow of information throughout the cell, 
has broadened and matured, it has become apparent that these 
protein-protein interactions are often mediated by modular 
domains within signalling proteins. Src, both the first 
proto-oncogene product and the first tyrosine kinase 

IS discovered (Taylor and Shallovay, 1993, Current Opinion in 
Genetics and Development 3:26-34), is the prototypic modular 
domain-containing protein. 

Src is a protein tyrosine kinase of 60 kilodaltons 
and is located at the plasma membrane of cells. It was first 

20 discovered in the 1970 's to be the oncogenic element of Rous 
sarcoma virus, and in the 1980*s, it was appreciated to be a 
component of the signal transduction system in animal cells. 
However, since the identification of viral and cellular forms 
of Src (i.e., v-Src and c-Src) , their respective roles in 

25 oncogenesis, normal cell growth, and differentiation have not 
been completely understood. 

In addition to its tyrosine kinase region (sometimes 
called a Src Homology 1 domain) , Src contains two regions that 
have been found to have functionally and structurally 

30 homologous counterparts in a large number of proteins. These 
regions have been designated the Src Homology 2 (SH2) and Src 
Homology 3 (SH3) domains. SH2 and SH3 domains are modular in 
that they fold independently of the protein that contains 
them, their secondary structure places N-and C-termini close 

35 to one another in space, and they appear at variable locations 
(anywhere from N-to C-terminal) from one protein to the next 
(Cohen et al. , 1995, Cell 80:237-248). SH2 domains have been 
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well-Studied and are known to be involved in binding to 
phosphorylated tyrosine residues (Pawson and Gish, 1992, Cell 
71:359-362) . 

The Src-hoBology region 3 (SH3) of src is a domain 
5 that is 60-70 amino acids in length and is present in many 
cellular proteins (Cohen et al., 1995, Cell 80:237-248; 
Pawson, 1995, Nature 373:573-580). Within Src, the SH3 domain 
is considered to be a negative inhibitory domain, because c- 
Src can be activated (i.e., transforming) through mutations in 
10 this domain (Jackson et al., 1993, Oncogene 8:1943-1956; 
Seidel-Dugan et al., 1992, Mol Cell Biol 12:1835-1845). 

To deduce the binding specificity of the Abl SH3 
domain, a group led by David Baltimore screened cDNA libraries 
with radiolabeled GST-Abl SH3 fusion protein and identified 
15 two binding cDNA clones (Cicchetti et al., 1992, Science 

257:803-806). Both clones encoded proteins with proline rich 
regions that were later shown to be SH3 binding domains. 

Subsequently, others have screened combinatorial 
peptide libraries and identified peptides that bound to the 
20 Src SH3 domain (Yu et al. , 1994, Cell 76:933-945; Cheadle et 
al., 1994, J. Biol. Chem. 269:24034-24039). Using the SH3 
domain of Src, Sparks et al., 1994, J. Biol. Chem. 269:23853- 
23856 screened phage-display random peptide libraries and 
identified a consensus peptide sequence that binds with 
2S specificity and high affinity to the Src SH3 domain. 

The consensus from these various studies is that the 
optimal Src SH3 peptide ligand is RPLPPLP (SEQ ID NO:45) . 
Recently, the structures of the peptide-SH3 domain complexes 
have been deduced by NMR and the peptides have been shown to 
30 bind in two possible orientations with respect to the SH3 
domain (Feng et al., 1994, Science 266:1241-1247; Lim et al., 
1994, Nature 372:375-379). 

Since SH3 domains have been found to have such 
important roles in the function of crucial signalling and 
35 structural elements in the cell, a' method of identifying 

proteins containing SH3 regions is of great interest. In this 
regard, it is important to note that such a method is 
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unavailable because of the low sequence similarity of modular 
functional domains, including SH3. See, e.g.. Figure 6, which 
illustrates the minimal primary sequence homology among 
various known SH3 domains. 
5 Sequence homology searches can potentially identify 

known proteins containing not yet recognized functional 
domains of interest, however, sequence homology generally 
needs to be >40% for this procedure to be successful. 
Functional domains generally are less than 40% homologous and 

XO therefore many would be missed in a sequence homology search. 
In addition, homology searches do not identify novel proteins; 
they only identify proteins already defined by nucleotide or 
amino acid sequence and present in the database. 

Another approach is to use hybridization techniques 

15 using nucleotide probes to search expression libraries for 
novel proteins. This method would have limited applicability 
to finding novel proteins containing functional domains due to 
the low sequence homology of the functional domains. 

Methods for isolating partner proteins involved in 

20 protein-protein interactions have generally focused on finding 
a ligand to a protein that has been found and characterized. 
Such approaches have included using anti-idiotypic antibodies 
that mimic the known protein to screen cDNA expression 
libraries for a binding ligand (Jerne, 1974, Ann. Immunol. 

25 (Inst. Pasteur) 125c:373-389; Sudol, 1994, Oncogene 9:2145- 
2152). Skolnick et al., 1991, Cell 65:83-90 isolated a 
binding partner for PI3-kinase by screening a cDKA expression 
library with the ^^P-labeled tyrosine phosphorylated carboxyl 
terminus of the epidermal growth factor receptor (EGFR) . 

30 An easy method for isolating operationally defined 

ligands involved in protein-protein interactions and for 
optimally identifying an exhaustive set of modular domain- 
containing proteins implicated in binding with the ligands 
would be highly desirable. 

35 If such a method were available, however, such a 

method would be useful for the isolation of any polypeptide 
having a functioning version of any functional domain of 

- 5 - 
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interest. Such a general method would be of tremendous 
utility in that whole families of related proteins each with 
its own version of the functional domain of interest could be 
identified. Knowledge of such related proteins would 
S contribute greatly to our understanding of various 

physiological processes, including cell growth or death, 
malignancy, and immune reactions, to name a few. Such a 
method would also contribute to the development of 
increasingly more effective therapeutic, diagnostic, or 
10 prophylactic agents having fewer side effects. 

According to the present invention, just such a 

method is provided. 

Regarding SH3 domain-containing proteins, the method 
of the present invention will contribute greatly to our 

15 understanding of cell growth (Zhu et al., 1993, J. Biol. Chem. 
268:1775-1779; Taylor and Shalloway, 1994, Nature 368:867- 
871), malignancy (Wages et al., 1992, J. Virol. 66:1866-1874; 
Bruton and Workman, 1993, Cancer Chemother. Pharmacol. 32:1- 
19), subcellular localization of proteins to the cytoskeleton 

20 and/or cellular membranes (Weng et al., 1993, J. Biol. Chem. 
268:14956-14963; Bar-Sagi et al., 1993, Cell 74:83-91), signal 
transduction (Duchesne et al., 1993, Science 259:525-528), 
cell morphology (Wages et al., 1992, J. Virol. 66:1866-1874; 
McGlade et al., 1993, EMBO J. 12:3073-3081), neuronal 

25 differentiation Tanaka et al., 1993, Mol. Cell. Biol. 13:4409- 
4415), T cell activation (Reynolds et al., 1992, Oncogene 
7:1949-1955), and cellular oxidase activity (McAdara and 
Babior, 1993, Blood 82: A28) . 

30 citation of a reference hereinabove shall not be 

construed as an admission that such is prior art to the 
present invention. 

3. glT ym&RV OF THF TWVENTIOW 

35 in general, the present invention is directed to a 

method of using isolated, operationally defined ligands 
involved in binding interactions for optimally identifying an 
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exhaustive set of compounds binding to such ligands. In one 
enbodinent, the isolated ligands are peptides involved in 
specific protein*protein interactions and are used to identify 
a set of novel modular domain-containing proteins that bind to 
5 the ligands. Using this method, proteins sharing only modest 
similarities but a common function can be found. 

The present Invention is directed to a method of 
identifying a polypeptide or family of polypeptides having a 
functional domain of interest. The basic steps of the method 

10 comprise: (a) choosing a recognition unit or set of 

recognition units having a selective affinity for a target 
molecule with a functional domain of interest; (b) contacting 
the recognition unit with a plurality of polypeptides; and 
(c) identifying a polypeptide having a selective binding 

15 affinity for the recognition unit, which polypeptide includes 
the functional domain of interest or a functional equivalent 
thereof . 

In one particular embodiment of the invention, 
exhaustive screening of proteins having a desired functional 

20 domain involves an iterative process by which ligands or 
recognition units for SH3 domains identified in the first 
round of screening are used to detect SH3 domain*-containlng 
proteins in successive expression library screens. 

More particularly, the method of the present 

25 invention includes choosing a recognition unit having a 
selective affinity for a target molecule with a functional 
domain of interest. With this recognition unit (particularly 
under the multvalent recognition unit screening conditions 
taught by the present invention) , it has further been 

30 discovered that a plurality of polypeptides from various 

sources can be examined such that certain polypeptides having 
a selective binding affinity for the recognition unit can be 
identified. The polypeptides so identified have been shown to 
include the functional domain of interest; that is, the 

35 functional domains found are workihg versions that are capable 
of displaying the same binding specificity as the functional 
domain of interest. Hence, the polypeptides identified by the 
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present method also possess those attributes of the functional 
domain of interest vhich allow these related polypeptides to 
exhibit the same, similar, or analogous (but functionally 
equivalent) selective affinity characteristics as the domain 

5 of interest of the initial target molecule. By screening the 
plurality of peptides for recognition unit binding, the 
methods of the present invention circumvent the limitations of 
conventional DNA-based screening methods and allow for the 
identification of highly disparate protein sequences 

10 possessing functionally equivalent functional domains. 

In specific embodiments of the present invention, 
the plurality of polypeptides is obtained from the proteins 
present in a cDNA expression library. The specificity of the 
polypeptides which bear the functional domain of interest or a 

15 functional equivalent thereof for various peptides or 

recognition units can subsequently be examined, allowing for a 
greater understanding of the physiological role of particular 
polypeptide/recognition unit interactions. Indeed, the 
present invention provides a method of targeted drug discovery 

20 based on the observed effects of a given drug candidate on the 
interaction between a recognition unit-polypeptide pair or a 
recognition unit and a "panel" of related polypeptides each 
with a copy or a functional equivalent of (e.g., capable of 
displaying the same binding specificity and thus binding to 

25 the same recognition unit as) the functional domain of 
interest . 

The present invention also provides polypeptides 
comprising certain amino acid sequences. Moreover, the 
present invention also provides nucleic acids, including 

30 certain DNA constructs comprising certain coding sequences. 
Using the methods of the present invention, more than eighteen 
different SH3 domain-containing proteins have been identified, 
over half of which have not been previously described. 

The present inventors have found, unexpectedly, that 

35 the valency (I.e., whether it is a' monomer, dimer, tetramer, 
etc.) of the recognition unit that is used to screen an 
expression library or other source of polypeptides apparently 

- 8 - 
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has a marked effect upon the specificity of the recognition 
unit-functional domain interaction. The present inventors 
have discovered that recognition units in the form of small 
peptides, in multivalent form, have a specificity that is 
5 eased but not forfeited. In particular, biotinylated peptides 
bound to a multivalent (believed to be tetravalent) 
streptavid in-alkaline phosphatase complex have an unexpected 
generic specificity. This allows such peptides to be used to 
screen libraries to identify classes of polypeptides 

10 containing functional domains that are similar but not 
identical in sequence to the peptides* original target 
functional domains. 

The present invention also provides methods for 
identifying potential new drug candidates (and potential lead 

X5 compounds) and determining the specificities thereof. For 
example, knowing that a polypeptide with a functional domain 
of interest and a recognition unit, e.g., a binding peptide, 
exhibit a selective affinity for each other, one may attempt 
to identify a drug that can exert an effect on the 

20 polypeptide-recognition unit interaction, e.g., either as an 
agonist or as an antagonist (inhibitor) of the interaction. 
With this assay, then, one can screen a collection of 
candidate "'drugs'* for the one exhibiting the most desired 
characteristic, e.g., the most efficacious in disrupting the 

25 interaction or in competing with the recognition unit for 
binding to the polypeptide. 

In addition, the present invention also provides 
certain assay kits and methods of using these assay kits for 
screening drug candidates for their ability to affect the 

30 binding of a polypeptide containing a functional domain to a 
recognition unit. In a particular aspect of the present 
invention, the assay kit comprises: (a) a polypeptide 
containing a functional domain of interest; and (b) a 
recognition unit having a selective binding affinity for the 

35 polypeptide. Yet another assay kit may comprise a plurality 
of polypeptides, each polypeptide containing a functional 
domain of interest, in which the functional domain of interest 



wo 96/31625 



PCTAJS96/04454 



is a domain selected from the group consisting of an SH1# SH2, 
SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc 
finger, leucine zipper, and helix-turn-helix, and at least one 
recognition unit having a selective affinity for each of the 
5 plurality of polypeptides. 

Other objects of the present invention will be 
apparent to those of ordinary skill upon further consideration 
of the following detailed description. 

10 4. DESCRTPTIO M OF THE FIGURES 

Figure 1 is a schematic representation of the 
general aspects of a method of identifying recognition units 
exhibiting a selective affinity for a target molecule with a 
functional domain of interest. In this illustration, the 
15 target molecule is a polypeptide with an SH3 domain, and the 
recognition units are peptides having a selective affinity for 
the SH3 domain that are expressed in a phage displayed 
library. 

20 Figure 2 illustrates the select ivities exhibited by 

particular recognition units that bind to the Src SH3 domain 
(in this case, two heptapeptides) for a ••panel" of known 
polypeptides known to contain an SH3 domain. The non-SH3- 
containing protein, GST, serves as control. RPLPPLP is (SEQ 

25 ID NO:45); APPVPPR is (SEQ ID NO:203) 

Figure 3 is a schematic representation of the 
general method of identifying polypeptides with a functional 
domain of interest by screening a plurality of polypeptides 
30 using a suitable recognition unit. In the illustration, the 
plurality of polypeptides is obtained from a cDNA expression 
library, and the recognition units are SH3 domain-binding 
peptides. 

35 Figure 4 illustrates how- an SH3 domain-binding 

peptide can be used to identify other SH3 domain-containing 
proteins. Shown is a schematic representation of the 

- 10 - 
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progression from initial selection of a target molecule with a 
functional domain of interest, choice of recognition unit, and 
identification of polypeptides that have a selective affinity 
for the recognition unit and include the functional domain of 
5 interest or a functional equivalent thereof. 

Figure 5 depicts filters from primary (Figure 5B) 
and tertiary (Figure 5A) screens of a XcDNA library probed 
with a biotinylated SH3-binding peptide recognition unit in 

10 the form of a complex with streptavidin-alkaline phosphatase 
(SA^AP) . A mouse 16 day embryo cDNA library in \EXlox was 
incubated with a multivalent complex formed between 
biotinylated pSrcCII and SA-AP. The sites of peptide binding 
were detected by incubation with BCIP (5-bromo*4-chloro-3- 

X5 indoyl-phosphate-p-toluidine salt) and NBT (nitroblue 
tetrazolium chloride) for approximately five minutes. 

Figure 6 shows an alignment of SH3 domains that 
illustrates the minimal primary sequence homology among 
20 various )cnown SH3 domains. The amino acid sequences shown are 
SEQ ID NOs: 68-111. 

Figure 7A is a schematic representation of a 
population of functional domains represented by the circles. 

25 "A" is a recognition unit specific to one circle only. B, on 
the other hand, recognizes three domains, while Bl and B2 
recognize only two each. Figure 7B illustrates an iterative 
method whereby new recognition units are chosen based on 
polypeptides uncovered with the first recognition unit(s). 

30 These new recognition units lead to the identification of 
other related polypeptides, etc., expanding the scope of the 
study to increasingly diverse members of the related 
population. 

35 Figure 8 illustrates the binding specificity of 

several SH3 domain recognition units. Biotinylated Class I 
(pSrcCI) or Class II (pSrcCII) Src SH3 domain recognition 
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units, Crk SH3 domain recognition units (pCrk) , PLC7 SH3 
domain recognition units (pPLC) , and Abl SH3 domain 
recognition units (pAbl) were tested for binding to the 
indicated GST-SH3 donain fusion proteins immobilized onto 
5 duplicate microtiter plate wells. Recognition units are 
listed along the left side of the figure; GST-SH3 domain 
fusion proteins are listed along the bottom. Recognition 
units were incubated either as multivalent complexes of 
biotinylated peptides and streptavidin-horseradish peroxidase 
xo (SA-HRP) (complexed) or as monovalent biotinylated peptides 
(uncomplexed) , followed by incubation with SA-HRP. Average 
optical densities are shown. 

Figure 9 shows a schematic of SH3-doinain containing 
15 proteins isolated using the present invention. The name, 
identity, type of screen, and number of individual clones 
derived for each sequence are indicated. Diagrams are to 
scale, with SH3 domains representing approximately 60 amino 
acids. The abbreviations AR, P, CR, E/P, and SH2 represent 
20 ankyrin repeats, proline-rich segments, Cortactin repeats, 
glutamate/proline-rich segments, and Src homology 2 domains, 
respectively. Flared ends represent putative translation 
initiation sites for individual cDNAs. The Mouse, Human 1, 
and Human 2 libraries correspond to mouse 16 day embryo, human 
25 bone marrow, and human prostate cancer cDNA libraries, 
respectively. For a description of the pSrcII and pCort 
recognition units, see Section 6.1. 

Figure lOA and lOB depicts the sequence alignment of 
30 SH3 domains in proteins isolated using the present invention. 
The name and identity of each clone is indicated. Where 
appropriate, multiple SH3 domains from the same polypeptide 
are designated A, B, C, etc., from N- to C-terminal. Periods 
indicate gaps introduced to maximize alignment of similar 
35 residues. Positions corresponding -to conserved residues shown 
to be involved in ligand binding in the SH3 domains of Src and 
Grb2/Sem5 (Tomasetto et al., 1995, Genomics 28:367-376) are 

- 12 - 
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presented in bold and underlined, respectively. Priioary 
structures of SH3P1-8 and SH3P10-13 correspond to Bouse, 
SH3P15-18, Clone 5, 34, 40, 41, 45, 53, 55, 56, »nd 65 to 
human, and SH3P9 and SH3P14 to mouse (») or human (h) cDNA 
5 clones. For sequence comparison, the sequence of the mouse c- 

/r.<>»«i>nv neeesaion number P41240) is shown. 
Src SH3 donaxn (GenBank accession numc eov7.;/Mc;i 
The GenBank accession numbers for mouse Cortactin, SPY75/HS1, 
CrK, and human MLN50, Lyn, Fyn, and Src are "03184, D42120, 
S72408, X82456, M16038, P06241, and P41240, respectively. The 
10 amino acid sequences shown are SEQ ID N0s:112-140. 

Figure 11 depicts the specificity continuum 
described in Section 5.2.1. "SA-AP peptide complex- 
represents the multivalent (believed to be tetravalent) 
15 complex of streptavidin-alkaline phosphatase and biotinylated 
peptide described in that section. 

Figure 12 depicts the results of experiments in 
Which peptide recognition units were synthesized and tested 
20 for their ability to bind to novel SH3 domains described in 
sections 6.1 and 6.1.1. A minus indicates no binding; a plus 
indicates binding, with the number of pluses -^^"^^^^^^^ 
strength of binding. For further details, see Section 6.2. 
The amino acid sequences shown are SEQ ID N0s:141-168. 

" Figure 13 depicts more data from the experiment 

depicted in Figure 12. The amino acid sequences shown are SEQ 
ID NOs: 169-188. 

Figure 14 illustraf. the effect of preconjug.tioi. 
with etreptevldin-eucaline phosphate., on the 
bl.tihyl.t«. peptlaee for SH3 domains, see Section 6.3.1 for 
details. 

Figure 15 llluetrates the effect of preconjugation 
with streptavldin-alXaUne phoephat... on the 'P'^i^^'^^ 
biotinylated peptide, for OSI-SH, domain fusion protexn. that 
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have been immobilized on nylon membranes. See Section 6*3«2 
for details. 

Figure 16 illustrates the effect of preconjugation 
5 with streptavidin-alJcaline phosphatase on the specificity of 
biotinylated peptides for proteins containing SH3 domains 
expressed by cDNA clones. See Section 6.3.3 for details. 

Figure 17 illustrates a strategy for exhaustively 
10 screening an expression library for SH3 domain-containing 
proteins. A peptide recognition unit is generated by 
screening a combinatorial peptide library for binders to an 
SH3 domain espressed bacterially as a GST fusion protein. 
This peptide is then used as a multivalent streptavidin- 
X5 biotinylated peptide complex to screen for a subset of the SH3 
domain*containing proteins represented in a cDNA expression 
library. A combinatorial library is once again used to 
identify recognition units of. SH3 domains identified in the 
first expression library screen; these recognition units 
20 identify overlapping sets of proteins from the expression 

library. With multiple iterations of this process, it should 
be possible to clone systematically all SH3 domains 
represented in a given cDNA expression library. 

25 Figure 18 depicts the nucleotide sequence of SH3P1, 

mouse p53bp2 (SEQ ID N0:5). 

Figure 19 depicts the amino acid sequence of SH3P1, 
mouse p53bp2 (SEQ ID N0:6). 

30 

Figure 20 depicts the nucleotide sequence of SH3P2, 
a novel mouse gene (SEQ ID N0;7). 

Figure 21 depicts the amino acid sequence of SH3P2, 
35 a novel mouse gene (SEQ ID N0:8). 
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Figure 22 depicts the nucleotide sequence of SH3P3, 
a novel mouse gene (SEQ ID NO: 9). 

Figure 23 depicts the amino acid sequence of SH3P3, 
5 a novel mouse gene (SEQ ID NO: 10). 

Figure 24 depicts the nucleotide sequence of SH3P4, 
a novel mouse gene (SEQ ID NO: 11). 

10 Figure 25 depicts the amino acid sequence of SH3P4, 

a novel mouse gene (SEQ ID NO: 12), 

Figure 26 depicts the nucleotide sequence of SH3P5, 
mouse Cortactin (SEQ ID NO: 13), 

15 

Figure 27 depicts the amino acid sequence of SH3P5, 
mouse Cortactin (SEQ ID NO: 14). 

Figure 28 depicts the nucleotide sequence of SH3P6, 
20 mouse MLN50 (SEQ ID N0:15). 

Figure 29 depicts the amino acid sequence of SH3P6, 
mouse MLN50 (SEQ ID N0:16). 

25 Figure 30 depicts the nucleotide sequence of SH3P7, 

a novel mouse gene (SEQ ID NO: 17). 

Figure 31 depicts the amino acid sequence of SH3P7, 
a novel mouse gene (SEQ ID N0:18). 

30 

Figure 32 depicts the nucleotide sequence of SH3P8, 
a novel mouse gene (SEQ ID N0:19). 

Figure 33 depicts the amino acid sequence of SH3P8, 
35 a novel mouse gene (SEQ ID NO: 20). • 
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Figure 34 depicts the nucleotide sequence of SH3P9, 
a novel mouse gene (SEQ ID N0:21) • 

Figure 35 depicts the amino acid sequence of SH3P9,. 
5 a novel mouse gene (SEQ ID NO: 22). 

Figure 36 depicts the nucleotide sequence of SH3P9, 
a novel human gene (SEQ ID NO: 23). 

xo Figure 37 depicts the amino acid sequence of SH3P9, 

a novel human gene (SEQ ID NO: 24). 

Figure 38 depicts the nucleotide sequence of SH3P10, 
mouse HSl (SEQ ID NO:25). 

15 

Figure 39 depicts the amino acid sequence of SH3P10, 
mouse HSl (SEQ ID N0:26). 

Figure 40 depicts the nucleotide sequence of SH3P11, 
20 mouse Crk (SEQ ID N0:27). 

Figure 41 depicts the amino acid sequence of SH3P11, 
mouse Crk (SEQ ID N0:28). 

25 Figure 42A depicts the nucleotide secjuence from 

positions 1-2600 of SH3P12, a novel mouse gene (a portion of 
SEQ ID NO: 29) . 

Figure 42 B depicts the nucleotide sequence from 
30 positions 2601-3335 of SH3P12, a novel mouse gene (a portion 
of SEQ ID NO: 29) . 

Figure 43 depicts the amino acid sequence of SH3P12, 
a novel mouse gene (SEQ ID NO: 30). 

35 

Figure 44 depicts the nucleotide sequence of SH3P13, 
a novel mouse gene (SEQ ID N0:31). 

- 16 - 



wo 96/31625 



PCTAJS96/044S4 



Figure 45 depicts the amino acid sequence of SH3P13, 
a novel mouse gene (SEQ ID NO: 32). 

Figure 46A depicts the nucleotide sequence from 
5 positions 1-2400 of SH3P14, mouse H74 (a portion of SEQ ID 
NO:33) . 

Figure 46B depicts the nucleotide sequence from 
positions 2351-4091 of SH3P14, mouse H74 (a portion of SEQ ID 
10 NO:33) . 

Figure 47 depicts the amino acid sequence of SH3P14, 
mouse H74 (SEQ ID NO:34). 

15 Figure 48 depicts the nucleotide sequence of SH3P14, 

human H74 (SEQ ID NO:35). 

Figure 49 depicts the amino acid sequence of SH3P14, 
human H74 (SEQ ID NO: 36). 

20 

Figure 50 depicts the nucleotide sequence of SH3P17; 
a novel human gene (SEQ ID NO: 37). 

Figure 51 depicts the amino acid sequence of SH3P17, 
25 a novel human gene (SEQ ID NO: 38). 

Figure 52A depicts the nucleotide sequence of 
SH3P18, a novel human gene (SEQ ID NO:39). 

30 Figure 53 depicts the amino acid sequence of SH3P18, 

a novel human gene (SEQ ID N0:40)« 

Figure 54 depicts the nucleotide sequence of clone 
55, a novel human gene (SEQ ID NO:189). 

35 

Figure 55 depicts the amino acid sequence of clone 
55, a novel human gene (SEQ ID NO:190). 
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Figure 56 depicts the nucleotide sequence of clone 
56, a novel human gene (SEQ ID N0:191). 

Figure 57 depicts the amino acid sequence of clone 
S 56, a novel human gene (SEQ ID NO: 192). 

Figure 58A depicts the nucleotide sequence from 
position 1-1720 of clone 65, a novel human gene (a portion of 
SEQ ID NO: 193). 

10 

Figure 58B depicts the nucleotide sequence from 
position 1721-2873 of clone 65, a novel human gene (a portion 
of SEQ ID NO: 193) . 

j5 Figure 59 depicts the amino acid sequence of clone 

65, a novel human gene (SEQ ID NO: 194). 

Figure 60 depicts the nucleotide sequence of clone 
34, a novel human gene (SEQ ID NO: 195). 

20 

Figure 61A depicts a portion of the amino acid 
sequence of clone 34, a novel human gene (a portion of SEQ ID 
NO: 196) . 

25 Figure 61B depicts a portion of the amino acid 

sequence of clone 34, a novel human gene (a portion of SEQ ID 
NO: 196) . 

Figure 62 depicts the nucleotide sequence of clone 
30 41, a novel human gene (SEQ ID NO: 197). 

Figure 63A depicts a portion of the amino acid 
sequence of clone 41, a novel human gene (a portion of SEQ ID 
NO: 198) . 

35 
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Pimre 63B depicts a portion of tl.e a«lno aoW 
=^1^™. 41, . nov.l hu«n ,.ne (a portion of SEO ID 
sequence of clone *a» » " 

ho: 198) . 

Figure .4A daplct. th. nuclaotid. wquane. Of Clone 
53, a novel hu««. gene (SEQ ID HO.IM). 

„,uenc. Of Son. 5,. a novel h».an gene <a portion of SEQ ID 
10 H0:200) . 

n,ure «D depict, a portion of the a.lno acid 
sequence of clon. S3, a novel hu,»n gen. (a portion of SEO 

N0:200) . 

Figure 6« and e6B depicts the nucleotide ee^ence 
,SEO ID and «.ino acid ..,u.nce ,SEO ID of 

Clone 5. a novel human gene. 

to «rtIirp^yp.Ulde. having a functional dcaln of 
broadly to of identifying and uain, 

interest and » «"<=^ U also direct«i to 

the.e P'l'^*'*"";^,^.;.^, op.raticn.Uy defined Uganda 
„ a method of «.xng ^ optimally identifying an 

r riv"sror=o:::r rdiig ugand. and t. 

exhaustive set ol c p .l^todiment, 
compounds, target - «-^-;^;^',l,„ of interest and to 

P'^^rTusir, tU. -ptld.. Th. d.tailed description 
30 methods of using th... c p invention furth« 

""T.t fu^e t» : otdlnary a.iU who may he 
:rt.::.r.ri: rttcmg particular aspect. Of the invention. 

„ Accordingly, the / join^. by peptide (I.e., 

comprued Of •"^^/.'^ttt.in. .nd p.ptid... Hence, the 

amide) bonds and includes pr 
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polypeptides of the present invention may have single or 
multiple chains of covalently linked amino acids and may 
further contain intrachain or interchain linkages comprised of 
disulfide bonds. Some polypeptides may also form a subunit of 
S a multiunit macromolecular complex. Haturally, the 
polypeptides can be expected to possess conformational 
preferences and to exhibit a three-dimensional structure. 
Both the conformational preferences and the three-dimensional 
structure will usually be defined by the polypeptide's primary 
10 (i.e., amino acid) sequence and/or the presence (or absence) 
of disulfide bonds or other covalent or non-covalent 
intrachain or interchain interactions. 

The polypeptides of the present invention can be any 
size. AS can be expected, the polypeptides can exhibit a wide 
15 variety of molecular weights, some exceeding 150 to 200 
kilodaltons (kD) . Typically, the polypeptides may have a 
molecular weight ranging from about 5,000 to about 100,000 
daltons. Still others may fall in a narrower range, for 
example, about 10,000 to about 75,000 daltons, or about 20,000 
20 to about 50,000 daltons. 

The phrase "functional domain" refers to a region of 
a polypeptide which affords the capacity to perform a 
particular function of interest. This function may give rise 
to a biological, chemical, or physiological consequence that 
25 may be reversible or irreversible and which may include, but 
not be limited to, protein-protein interactions (e.g., binding 
interactions) involving the functional domain, a change in the 
conformation or a transformation into a different chemical 
state of the functional domain or of molecules acted upon by 
30 the functional domain, the transduction of an intracellular or 
intercellular signal, the regulation of gene or protein 
expression, the regulation of cell growth or death, or the 
activation or inhibition of an immune response. Furthermore, 
the functional domain of interest is defined by a particular 
35 functional domain that is present in a given target molecule. 
A discussion of the selection of a particular functional 
domain-containing target molecule is presented further below. 
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Many functional domains tend to be modular in that 
such domains may occur one or more times in a given 
polypeptide (or target molecule) or may be found in a family 
of different polypeptides. When found more than once in a 
5 given polypeptide or in different polypeptides, the modular 
functional domain may possess substantially the same 
structure, in terms of primary sequence and/ or three- 
dimensional space, or may contain slight or great variations 
or modifications among the different versions of the 

10 functional domain of interest. 

What is important, however, is that these related 
functional domains retain the functional aspects of the 
functional domain of interest present in the target molecule. 
It is stressed that, indeed, it is this functional 

15 relationship among two or more possible versions of a 
functional domain of interest which may be identified, 
defined, and exploited by the methods of the present 
invention. In a preferred aspect, the function of interest is 
the ability to bind to a molecule (e.g., a peptide) of 

20 interest. 

The present invention provides a general strategy by 
which recognition units that bind to a functional domain- 
containing molecule can be used to screen expression libraries 
of genes (e.g., cDNA, genomic libraries) systematically for 

25 novel functional domain-containing proteins. In specific 
embodiments, the recognition units are prior isolated from a 
random peptide library, or are known peptide ligands or 
recognition units, or are recognition units that are 
identified by database searches for sequences having homology 

30 to a peptide recognition unit having the binding specificity 
of interest. Using the methods of the present invention, it 
is possible to exhaustively screen an expression library for 
proteins with a given functional domain. 

In the prior art, novel genes (and thus their 

35 encoded protein products) are most * commonly identified from 
cDNA libraries. Generally, an appropriate cDNA library is 
screened with a probe that is either an oligonucleotide or an 
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antibody. In either case, the probe must be specific enough 
for the gene that is to be identified to pick that gene out 
from a vast background of non- relevant genes in the library. 
It is this need for a specific probe that is the highest 
5 hurdle that must be overcome in the prior art identification 
of novel genes. Another method of identifying genes from cDNA 
libraries is through use of the polymerase chain reaction 
(PCR) to amplify a segment of a desired gene from the library. 
PCR requires that oligonucleotides having sequence similarity 
10 to the desired gene be available. 

If the probe used in prior art methods is a nucleic 
acid, the cDNA library may be screened without the need for 
expressing any protein products that might be encoded by the 
cDNA clones. If the probe used in prior art methods is an 
15 antibody, then it is necessary to build the cDNA library into 
a suitable expression vector. For a comprehensive discussion 
of the art of identifying genes from cDNA libraries, see 
Sambrook, Fritsch, and Maniatis, "Construction and Analysis of 
cDNA Libraries," Chapter 8 in Cloning, A Laboratory Manual, 2d 
20 ed.. Cold Spring Harbor Laboratory Press, 1989. See also 
Sambrook, Fritsch, and Maniatis, "Screening Expression 
Libraries with Antibodies and Oligonucleotides," Chapter 12 in 
Cloning, A Laboratory Manual, 2d ed.. Cold Spring Harbor 
Laboratory Press, 1989. 
25 As an alternative to cDNA libraries, genomic 

libraries are used. When genomic libraries are used in prior 
art methods, the probe is virtually always a nucleic acid 
probe. See Sambrook, Fritsch, and Maniatis, "Analysis and 
Cloning of Eukaryotic Genomic DNA," Chapter 9 in Cloning, A 
30 Laboratory Manual, 2d ed. , Cold Spring Harbor Laboratory 
Press, 1989. 

In the prior art, nucleic acid probes used in 
screening libraries are often based upon the sequence of a 
known gene that is thought to be homologous to a gene that it 
35 is desired to isolate. The success of the procedure depends 
upon the degree of homology between the probe and the target 
gene being sufficiently high. Probes based upon the sequences 
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of knovm functional domains In proteins had limited value 
because, while the sequences of the functional domains were 
similar enough to allow for their recognition as shared 
domains, the similarity was not so high that the probes could 
5 be used to screen cDNA or genomic libraries for genes 
containing the functional domains. 

PGR may also be used to identify genes from genomic 
libraries. However, as in the case of using PGR to identify 
genes from cDNA libraries, this requires that oligonucleotides 

10 having sequence similarity to the desired gene be available. 

Using the screening methods provided by the present 
Invention, DNA encoding proteins having a desired functional 
domain that would not be readily identified by sequence 
homology can be Identified by functional binding specificity 

15 to recognition units. By virtue of an ease in specificity of 
binding requirements conferred by the screening methods of the 
present invention, many novel, functionally homologous, 
functional domaln-containing proteins can be identified. 
Although not intending to be bound by any mechanistic 

20 explanation, this ease in binding specificity is believed to 
be the result of the use of a multivalent peptide recognition 
unit used to screen the gene library, preferably of a valency 
greater than bivalent, more preferably tetravalent or greater, 
and most preferably the streptavldin-blotinylated peptide 

25 recognition unit complex. 

In one particular embodiment of the invention, 
exhaustive screening of proteins having a desired functional 
domain involves an iterative process by which recognition 
units for SH3 domains identified in the first round of 

30 screening are used to detect SH3 domain-containing proteins in 
successive expression library screens (see Figure 17). This 
strategy enables one to search ••sequence space** in what might 
be thought of as ever-widening circles with each successive 
cycle. This iterative strategy can be initiated even when 

35 only one functional domain-containing protein and recognition 
unit are available. 



- 23 - 



wo 96/31625 



PCT/US96/04454 



This iterative process is not limited to proteins 
containing SH3 domains. Members within a class of other 
functional domains also tend to have overlapping, or at least 
similar recognition unit preferences, are structurally stable, 
5 and often confer similar binding properties to a wide variety 
of proteins. These characteristics predict that the methods 
of the present invention will be applicable to a wide variety 
of functional domain-containing proteins in addition to their 
applicability to SH3 domain-containing proteins. 



10 



5.1. Discovery of Novel Genes and Polypeptides Containing 

Functional Domains « 

The present invention provides methods for the 
identification of one or more polypeptides (in particular, a 
"family" of polypeptides, including the target molecule) that 
contains a functional domain of interest that either 
corresponds to or is the functional equivalent of a functional 
domain of interest present in a predetermined target molecule. 

The present invention provides a mechanism for the 
rapid identification of genes (e.g., cDNAs) encoding virtually 
any functional domain of interest. By screening cDNA 
libraries or other sources of polypeptides for recognition 
unit binding rather than sequence similarity, the present 
invention circumvents the limitations of conventional DNA- 
based screening methods and allows for the identification of 
highly disparate protein sequences possessing equivalent 
functional activities. The ability to isolate entire 
repertoires of proteins containing particular modular 
functional domains will prove invaluable both in molecular 
biological investigations of the genome and in bringing new 
targets into drug discovery programs. 

It should likewise be apparent that a wide range of 
polypeptides having a functional domain of interest can be 
identified by the process of the invention, which process 
comprises: 

' (a) contacting a multivalent recognition unit 

complex with a plurality of polypeptides; and 
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(b) identifying a polypeptide having a selective 
binding affinity for said recognition unit complex. 

In a specific embodiment, the process comprises: 
(a) contacting a multivalent recognition unit 
5 complex with a plurality of polypeptides from which it is 
desired to identify a polypeptide having selective binding 
affinity for the recognition unit, in which the valency of the 
recognition unit in the complex is at least two, or at least 
four; and 

10 (b) identifying, and preferably recovering, a 

polypeptide having a selective binding affinity for the 

recognition unit complex. 

In another specific embodiment, the process 

comprises a method of identifying at least one polypeptide 
15 comprising a functional domain of interest, said method 

comprising: 

(a) contacting one or more multivalent recognition 
unit complexes with a plurality of polypeptides; and 

(b) identifying at least one polypeptide having 
20 selective binding affinity for at least one of said 

recognition unit complexes. 

In another specific embodiment, the process 

comprises: 

(a) contacting a multivalent recognition unit 
25 complex, which complex comprises (i) avidin or streptavidin, 
and (ii) biotinylated recognition units, with a plurality of 
polypeptides from a cDNA expression library, in which the 
recognition unit is a peptide having in the range of 6 to 60 
amino acid residues; and 
30 (b) identifying a polypeptide having a selective 

binding affinity for said recognition unit complex. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having an SH3 
domain of interest comprising: 
35 (a) contacting a multivalent recognition unit 

complex, which complex comprises (i) avidin or streptavidin, 
and (ii) biotinylated recognition units, with a plurality of 
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polypeptides from a cDNA expression library, in which the 
recognition unit is a peptide having in the range of 6 to 60 
amino acid residues and which selectively binds an SH3 domain; 
and 

5 (b) identifying a polypeptide having a selective 

binding affinity for said recognition unit complex. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
functional domain of interest or a functional equivalent 
10 thereof comprising: 

(a) screening a random peptide library to identify a 
peptide that selectively binds a functional domain of 
interest ; and 

(b) screening a cDNA or genomic expression library 
15 with said peptide or a binding portion thereof to identify a 

polypeptide that selectively binds said peptide. 

In a specific embodiment of the above method, the 
screening step (b) is carried out by use of said peptide in 
the form of multiple antigen peptides (MAP) or by use of said 

20 peptide cross-linked to bovine serum albumin or keyhole limpet 
hemocyanln. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
functional domain of interest or a functional equivalent 

25 thereof comprising: 

(a) screening a random peptide library to identify a 
plurality of peptides that selectively bind a functional 
domain of interest; 

(b) determining at least part of the amino acid 
30 sequences of said peptides; 

(c) determining a consensus sequence based upon the 
determined amino acid sequences of said peptides; and 

(d) screening a cDNA or genomic expression library 
with a peptide comprising the consensus sequence to identify a 

35 polypeptide that selectively binds- said peptide. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
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functional domain of interest or a functional equivalent 
thereof comprising: 

(a) screening a random peptide library to identify a 
first peptide that selectively binds a functional domain of 

5 interest; 

(b) determining at least part of the amino acid 
sequence of said first peptide; 

(c) searching a database containing the amino acid 
sequences of a plurality of expressed natural proteins to 

10 identify a protein containing an amino acid sequence 

homologous to the amino acid sequence of said first peptide; 
and 

(d) screening a cDNA or genomic expression library 
with a second peptide comprising the sequence of said protein 

15 that is homologous to the amino acid sequence of said first 
peptide. 

The identified polypeptide identified by the above- 
described methods thus should contain the functional domain of 
interest or a functional equivalent thereof (that is, having a 

20 functional domain that is identical, or having a functional 
domain that differs in sequence but is capable of binding to 
the same recognition unit) • In a particular embodiment, the 
polypeptide identified is a novel polypeptide. In a preferred 
embodiment, the recognition unit that is used to form the 

25 multvalent recognition unit complex is isolated or identified 
from a random peptide library. 

In a specific embodiment, the present invention 
provides amino acid sequences and DNA sequences encoding novel 
proteins containing SH3 domains. The SH3 domains vary in 

30 sequence but retain binding specificity to an SH3 domain 

recognition unit. Also provided are fragments and derivatives 
of the novel proteins containing SH3 domains as well as DNA 
sequences encoding the same. It will be apparent to one of 
ordinary skill in the art that also provided are proteins that 

35 vary slightly in sequence from the novel proteins by virtue of 
conservative amino acid substitutions. It will also be 
apparent to one of ordinary skill in the art that the novel 
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proteins may be expressed recombinantly by standard methods. 
The novel proteins may also be expressed as fusion proteins 
with a variety of other proteins, e.g., glutathione S- 
transferase. 

5 The present invention provides a purified 

polypeptide comprising an SH3 domain, said SH3 domain having 
an amino acid sequence selected from the group consisting of: 
SEQ ID NOs: 113-115, 118-121, 125-128, 133-139, 204-218, and 
219. Also provided is a purified DNA encoding the 

10 polypeptide. 

Also provided is a purified polypeptide comprising 
an SH3 domain, said polypeptide having an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 8, 10, 12, 
18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 

15 and 221. Also provided is a purified DNA encoding the 
polypeptide. 

Also provided is a purified DNA encoding an SH3 
domain, said DNA having a sequence selected from the group 
consisting of SEQ ID NOs: 7, 9, 11, 17, 19, 21, 23, 29, 31, 

20 37, 39, 189, 191, 193, 195, 197, 199, and 220. Also provided 
is a nucleic acid vector comprising this purified DNA. Also 
provided is a recombinant cell containing this nucleic acid 
vector • 

Also provided is a purified DNA encoding a 
25 polypeptide having an amino acid sequence selected from the 
group consisting of: SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 
30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. Also 
provided is a nucleic acid vector comprising this purified 
DNA. Also provided is a recombinant cell containing this 
30 nucleic acid vector. 

Also provided is a purified DNA encoding a 
polypeptide comprising an amino acid sequence selected from 
the group consisting of: SEQ ID NOs:113-115, 118-121, 125-128, 
133-139, 204-218, and 219. Also provided is a nucleic acid 
35 vector comprising this purified DNA. Also provided is a 
recombinant cell containing this nucleic acid vector. 
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Also provided Is a purified molecule comprising an 
SH3 domain of a polypeptide having an amino acid sequence 
selected from the group consisting of: SEQ ID NO: 8, 10, 12, 
18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 
5 and 221. 

Also provided is a fusion protein comprising (a) an 
amino acid sequence comprising an SH3 domain of a polypeptide 
having the amino acid sequence of SEQ ID NO: 8, 10, 12, 18, 
20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and 

10 221 joined via a peptide bond to (b) an amino acid sequence of 
at least six, or ten, or twenty amino acids from a different 
polypeptide. Also provided is a purified DNA encoding the 
fusion protein. Also provided is a nucleic acid vector 
comprising the purified DNA encoding the fusion protein. Also 

15 provided is a recombinant cell containing this nucleic acid 
vector. Also provided is a method of producing this fusion 
protein comprising culturing a recombinant cell containing a 
nucleic acid vector encoding said fusion protein such that 
said fusion protein is expressed, and recovering the expressed 

20 fusion protein. 

The present invention also provides a purified 
nucleic acid hybridizable to a nucleic acid having a sequence 
selected from the group consisting of: SEQ ID NOs: 7, 9, 11, 
17, 19, 21, 23, 29, 31, 37, 39, 189, 191, 193, 195, 197, 199, 

25 and 220. 

The present invention also provides antibodies to a 
polypeptide having an amino acid sequence selected from the 
group consisting of: SEQ ID NOs:113-115, 118-121, 125-128, 
133-139, 204-218, and 219. 

30 The present invention also provides antibodies to a 

polypeptide having an amino acid sequence selected from the 
group consisting of SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 
32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. 

It is demonstrated by way of example herein that 

35 recognition units that comprise SH3 domain ligands derived 
from combinatorial peptide libraries may be used in the 
methods of the present invention as probes for the rapid 
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discovery of novel proteins containing SH3 functional domains. 
The methods of the present invention require no prior 
knowledge of the characteristics of a SH3 domain's natural 
cellular ligand to initiate the process of discovery. One 
5 needs only enough purified SH3 domain-containing protein (by 
way of example, l-S^g) to select peptides from a random 
peptide library. In addition, because the methods of the 
present invention identify novel proteins from cDNA expression 
libraries based only on their binding properties, low primary 
10 sequence identity between the target SH3 domain and the SH3 
domains of the novel proteins discovered need not be a 
limitation, provided some functional similarity between these 
SH3 domains is conserved. Also, the methods of the present 
invention are rapid, require inexpensive reagents, and employ 
15 simple and well established laboratory techniques. 

Using these methods, more than eighteen different 
SH3 domain-containing proteins have been identified, over half 
of which have not been previously described. While certain of 
these previously unknown proteins are clearly related to known 
20 genes such as amphiphysin and drebrin, others constitute new 
classes of signal transduction and/or cytoskeletal proteins. 
These include SH3P17 and SH3P18, two members of a new family 
of adaptor-like proteins comprised of multiple SH3 domains; 
SH3P12, a novel protein with three SH3 domains and a region 
25 similar to the extracellular peptide hormone sorbin; and 

SH3P4, SH3P8, and SH3P13, three members of a third new family 
of SH3-containing proteins. These novel proteins are 
described more fully in Sections 6.1 and 6.1.1. The high 
incidence of novel proteins identified by the methods of the 
30 present invention indicates that a large number of SH3 domain- 
containing proteins remain to be discovered by application of 
the methods of the invention. 

one of ordinary skill in the art would recognize 
that the above-described novel proteins need not be used in 
35 their entirety in the various applications of those proteins 
described herein. In many cases it will be sufficient to 
employ that portion of the novel protein that contains the 
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functional (e.g., SH3) domain. Such exemplary portions of SH3 
domains-containing proteins are shown in Figxire lOA and lOB. 
Accordingly, the present invention provides derivatives (e.g., 
fragments and molecules comprising these fragments) of novel 
5 proteins that contain SH3 domains, e.g., as shown in Figxire 
lOA and lOB. Nucleic acids encoding these fragments or other 
derivatives are also provided. 

In another embodiment, the present invention 
includes a method of identifying one or more novel 
10 polypeptides having an SH3 domain, said method comprising: 

(a) identifying a recognition unit having a 
selective affinity for the SH3 domain by screening a peptide 
library with the SH3 domain; 

(b) producing said recognition unit; 

15 (c) contacting said recognition unit with a source 

of polypeptides; and 

(d) identifying one or more novel polypeptides 
having a selective affinity for said recognition unit, which 
polypeptides comprise the SH3 domain. 

20 

5.1.1 Functional Domains 

Functional domains of interest in the practice of 
the present invention can take many forms and may perform a 
variety of functions. For example, such functional domains 

25 may be involved in a number of cellular, biochemical, or 

physiological processes, such as cellular signal transduction, 
transcriptional regulation, translational regulation, cell 
adhesion, migration or transport, cytokine secretion and other 
aspects of the immune response, and the like. In particular 

30 embodiments of the present invention, the functional domains 
of interest may consist of regions known as SHI, SH2, SH3, PH, 
PTB, LIM, armadillo, and Notch/ankyrin repeat. See, e.g., 
Pawson, 1995, Nature 373:573-580; Cohen et al., 1995, Cell 
80:237-248. Functional domains may also be chosen from among 

35 regions known as zinc fingers, leucine zippers, and helix* 
turn*-helix or helix-loop-helix. Certain functional domains 
may be binding domains, such as DNA-binding domains or act in- 
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binding domains. Still other functional domains may serve as 
sites of catalytic activity. 

In one embodiment of the invention, a suitable 
target molecule containing the chosen functional domain of 
5 interest is selected. In the case of an SH3 domain, for 

example, a nximber of proteins (or functional domain-containing 
derivatives or analogs thereof) loay be selected as the target 
molecule, including but not limited to, the Src family of 
proteins: Fyn, Lck, Lyn, Src, or Yes- Still other proteins 

10 contain an SH3 domain and can be used, including, but not 
limited to: Abl, Crk, Nek (other oncogenes), Grb2, PLC7, 
RasGAP (proteins involved in signal transduction), ABP-1, 
royosin-1, spectrin (proteins found in the cytoskeleton) , and 
neutrophil NADPH oxidase (an enzyme) . In the case of a 

15 catalytic site, any catalytically active protein, such as an 
enzyme, can be used, particularly one whose catalytic site is 
known. For example, the catalytic site of the protein 
glutathione S-transf erase (GST) can be used. Other target 
molecules that possess catalytic activity may include, but are 

20 not limited to, protein serine/threonine kinases, protein 
tyrosine kinases, serine proteases, DNA or RNA polymerases, 
phospholipases, GTPases, ATPases, Pl-kinases, DNA methylases, 
metabolic enzymes, or protein glycosylases. 

25 5.1.2. Recognition Units 

By the phrase "recognition unit," is meant any 
molecule having a selective affinity for the functional domain 
of the target molecule and, preferably, having a molecular 
weight of up to about 20,000 daltons. In a particular 

30 embodiment of the invention, the recognition unit has a 
molecular weight that ranges from about 100 to about 10,000 
daltons. 

Accordingly, preferred recognition units of the 
present invention possess a molecular weight of about 100 to 
35 about 5,000 daltons, preferably from about 100 to about 2,000 
daltons, and most preferably from about 500 to about 1,500 
daltons. As described further below, the recognition unit of 
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the present invention can be a peptide, a carbohydrate, a 
nucleoside, an oligonucleotide, any small synthetic molecule, 
or a natural product • When the recognition unit is a peptide, 
the peptide preferably contains about 6 to about 60 amino acid 
5 residues. 

When the recognition unit is a peptide, the peptide 
can have less than about 140 amino acid residues; preferably, 
the peptide has less than about 100 amino acid residues; 
preferably, the peptide has less than about 70 amino acid 
10 residues; preferably, the peptide has 20 to 50 amino acid 

residues; most preferably, the peptide has about 6 to 60 amino 
acid residues. 

The peptide recognition units are preferably in the 
form of a multivalent peptide complex comprising avidin or 
15 streptavidin (optionally conjugated to a label such as 
alkaline phosphatase or horseradish peroxidase) and 
biotinylated peptides. 

According to the present invention, a recognition 
unit (preferably in the form of a multvalent recognition unit 
20 complex) is used to screen a plurality of expression products 
of gene sequences containing nucleic acid sequences that are 
present in native RNA or DNA (e.g., cDNA library, genomic 
library) . 

The step of choosing a recognition unit can be 
25 accomplished in a number of ways that are known to those of 
ordinary skill, including but not limited to screening cDNA 
libraries or random peptide libraries for a peptide that binds 
to the functional domain of interest. See, e.g., Yu et al., 
1994, Cell 76, 933-945; Sparks et al., 1994, J. Biol. Chem. 
30 269, 238 53-23856. Alternatively, a peptide or other small 
molecule or drug may be known to those of ordinary skill to 
bind to a certain target molecule and can be used. The 
recognition unit can even be synthesized from a lead compound, 
which again may be a peptide, carbohydrate, oligonucleotide, 
35 small drug molecule, or the like. 'The recognition unit can 
also be identified for use by doing searches (preferably via 
database) for molecules having homology for other, known 
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recognition unit(s) having the ability to selectively bind to 
the functional domain of interest. 

In a specific embodiment, the step of selecting a 
recognition unit for use can be effected by, e.gr., the use of 
5 diversity libraries, such as random or combinatorial peptide 
or nonpeptide libraries, which can be screened for molecules 
that specifically bind to the functional domain of interest, 
e.g., an SH3 domain. Many libraries are known in the art that 
can be used, e.g., chemically synthesized libraries, 
10 recombinant (e.g., phage display libraries), and in vitro 
translation-based libraries. 

Examples of chemically synthesized libraries are 
described in Fodor et al., 1991, Science 251:767-773; Houghten 
et al., 1991, Nature 354:84-86; Lam et al., 1991, 
15 Nature 354:82-84; Medynski, 1994, Bio/Technology 12:709-710; 
Gallop et al., 1994, J. Medicinal Chemistry 37 (9) :1233-1251; 
Ohlmeyer et al., 1993, Proc. Natl. Acad. Sci. USA 
90:10922-10926; Erb et al., 1994, Proc. Natl. Acad. Sci. USA 
91:11422-11426; Houghten et al., 1992, Biotechniques 13:412; 
20 Jayawickreme et al., 1994, Proc. Natl. Acad. Sci. USA 

91:1614-1618; Salmon et al., 1993, Proc. Natl. Acad. Sci. USA 
90:11708-11712; PCT Publication No. WO 93/20242; and Brenner 
and Lerner, 1992, Proc. Natl. Acad. Sci. USA 89:5381-5383. 

Examples of phage display libraries are described in 
25 Scott and Smith, 1990, Science 249:386-390; Devlin et al., 
1990, Science, 249:404-406; Christian, R.B., et al., 1992, J. 
Mol. Biol. 227:711-718); Lenstra, 1992, J. Immunol. Meth. 
152:149-157; Kay et al., 1993, Gene 128:59-65; and PCT 
Publication No. WO 94/18318 dated August 18, 1994. 
30 In vitro translation-based libraries include but are 

not limited to those described in PCT Publication No. 
WO 91/05058 dated April 18, 1991; and Mattheakis et al., 1994, 
Proc. Natl. Acad. Sci. USA 91:9022-9026. 

By way of examples of nonpeptide libraries, a 
35 benzodiazepine library (see e.g., Bunin et al., 1994, Proc. 
Natl. Acad. sci. USA 91:4708-4712) can be adapted for use. 
Peptoid libraries (Simon et al., 1992, Proc. Natl. Acad. Sci. 
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USA 89:9367-9371) can also be used. Another example of a 
library that can be used, in which the amide functionalities 
in peptides have been permethylated to generate a chemically 
transformed combinatorial library, is described by Ostresh et 
5 al. (1994, Proc. Natl. Acad. Sci. USA 91:11138-11142). 

The variety of non-peptide libraries that are useful 
in the present invention is great. For example, Ecker and 
Crooke, 1995, Bio/Technology 13:351-360 list benzodiazepines, 
hydantoins, piperazinediones, biphenyls, sugar analogs, /J- 
10 mercaptoketones, arylacetic acids, acylpiperidines, 

benzopyrans, cubanes, xanthines, aminimides, and oxazolones as 
among the chemical species that form the basis of various 
libraries. 

Non-peptide libraries can be classified broadly into 

15 two types: decorated monomers and oligomers. Decorated 

monomer libraies employ a relatively simple scaffold structure 
upon which a variety of functional groups is added. Often the 
scaffold will be a molecule with a known useful 
pharmacological activity. For example, the scaffold might be 

20 the benzodiazapine structure. 

Non-peptide oligomer libraries utilize a large 
number of monomers that are assembled together in a ways that 
create new shapes that depend on the order of the monomers. 
Among the monomer units that have been used are carbamates, 

25 pyrrolinones, and morpholinos. Peptoids, peptide-like 

oligomers in which the side chain is attached to the a amino 
group rather than the a carbon, form the basis of another 
version of non-peptide oligomer libraries. The first non- 
peptide oligomer libraries utilized a single type of monomer 

30 and thus contained a repeating backbone. Recent libraries 
have utilized more than one monomer, giving the libraries 
added flexibility. 

Screening the libraries can be accomplished by any 
of a variety of commonly known methods. See, e.g., the 

35 following references, which disclode screening of peptide 
libraries: Parmley and Smith, 1989, Adv. Exp. Med. Biol. 
251:215-218; Scott and Smith, 1990, Science 249:386-390; 
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Fowlkes et al., 1992; BloTechnlgues 13:422-427; Oldenburg et 
al., 1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et al., 
1994, Cell 76:933-945; Staudt et al., 1988, Science 241:577- 
580; Bock et al., 1992, Nature 355:564-566; Tuerk et al., 
S 1992, Proc. Natl. Acad. Sci. USA 89:6988-6992; Ellington et 
al., 1992, Nature 355:850-852; U.S. Patent No. 5,096,815, 
U.S. Patent No. 5,223,409, and U.S. Patent No. 5,198,346, all 
to Ladner et al.; Rebar and Pabo, 1993, Science 263:671-673; 
and PCT Publication No. WO 94/18318. 
10 In a specific embodiment, screening to identify a 

recognition unit can be carried out by contacting the library 
members with an SH3 domain immobilized on a solid phase and 
harvesting those library members that bind to the SH3 domain. 
Examples of such screening methods, termed "panning" 
15 techniques are described by way of example in Parmley and 
Smith, 1988, Gene 73:305-318; Fowlkes et al. , 1992, 
BioTechniques 13:422-427; PCT Publication No. WO 94/18318; and 
in references cited hereinabove. 

In another embodiment, the two-hybrid system for 
20 selecting interacting proteins in yeast (Fields and Song, 
1989, Nature 340:245-246; Chien et al. , 1991, Proc. Natl. 
Acad. Sci. USA 88:9578-9582) can be used to identify 
recognition units that specifically bind to SH3 domains. 

Where the recognition unit is a peptide, the peptide 
25 can be conveniently selected from any peptide library, 
including random peptide libraries, combinatorial peptide 
libraries, or biased peptide libraries. The term "biased" is 
used herein to mean that the method of generating the library 
is manipulated so as to restrict one or more parameters that 
30 govern the diversity of the resulting collection of molecules, 
in this case peptides. 

Thus, a truly random peptide library would generate 
a collection of peptides in which the probability of finding a 
particular amino acid at a given position of the peptide is 
35 the same for all 20 amino acids. A bias can be introduced 
into the library, however, by specifying, for example, that a 
lysine occur every fifth amino acid or that positions 4, 8, 
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and 9 of a decapeptide library be fixed to Include only 
arginlne. Clearly, many types of biases can be contenplated, 
and the present invention is not restricted to any particular 
bias. Furthermore, the present invention contemplates 
5 specific types of peptide libraries, such as phage-displayed 
peptide libraries and those that utilize a DNA construct 
comprising a lambda phage vector with a DNA insert. 

As mentioned above, in the case of a recognition 
unit that is a peptide, the peptide may have about 6 to less 

10 than about 60 amino acid residues, preferably about 6 to about 
25 amino acid residues, and most preferably, about 6 to about 
15 amino acids. In another embodiment, a peptide recognition 
unit has in the range of 20-100 amino acids, or 20-50 amino 
acids. In the case of a bile acid receptor, for example, the 

15 recognition unit may be a bile acid, such as cholic acid or 
cholesterol, and may have a molecular weight of about 300 to 
about 600. If the functional domain relates to 
transcriptional control, the recognition unit may be a portion 
of a transcriptional factor, which may bind to a region of a 

20 gene of interest or to an RNA polymerase. The recognition 
unit may even be a nucleoside analog, such as cordycepin or 
the triphosphate thereof, capable of inhibiting RNA 
biosynthesis. The recognition unit may also be the 
carbohydrate portion of a glycoprotein, which may have a 

25 selective affinity for the asialoglycoprotein receptor, or the 
repeating glucan unit that exhibits a selective affinity for a 
cellulose binding domain or the active site of heparinase. 

The selected recognition unit can be obtained by 
chemical synthesis or recombinant expression. It is 

30 preferably purified prior to use in screening a plurality of 
gene sequences. 

5.1.3. Screening a Source of Polvoeptides 

After the recognition unit is chosen for use, the 
35 recognition unit is then contacted* with a plurality of 

polypeptides, preferably containing a functional domain. In a 
particular embodiment of the invention, the plurality of 
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polypeptides is obtained from a polypeptide expression 
library. The polypeptide expression library may be obtained, 
in turn, from cDNA, fragmented genomic DNA, and the like. In 
a specific embodiment, the library that is screened is a cDNA 
5 library of total poly A+ RNA of an organism, in general, or of 
a particular cell or tissue type or developmental stage or 
disease condition or stage. The expression library may 
utilize a number of expression vehicles known to those of 
ordinary skill, including but not limited to, recombinant 
10 bacteriophage, lambda phage, M13, a recombinant plasmid or 
cosmid, and the like. 

The plurality of polypeptides or the DNA seG[uences 
encoding same may be obtained from a variety of natural or 
unnatural sources, such as a procaryotic or a eucaryotic cell, 
X5 either a wild type, recombinant, or mutant. In particular, 
the plurality of polypeptides may be endogenous to 
microorganisms, such as bacteria, yeast, or fungi, to a virus, 
to an animal (including mammals, invertebrates, reptiles, 
birds, and insects) or to a plant cell. 
20 In addition, the plurality of polypeptides may be 

obtained from more specific sources, such as the surface coat 
of a virion particle, a particular cell lysate, a tissue 
extract, or they may be restricted to those polypeptides that 
are expressed on the surface of a cell membrane. 
25 Moreover, the plurality of polypeptides may be 

obtained from a biological fluid, particularly from humans, 
including but not limited to blood, plasma, serum, urine, 
feces, mucus, semen, vaginal fluid, amniotic fluid, or 
cerebrospinal fluid. The plurality of polypeptides may even 
30 be obtained from a fermentation broth or a conditioned medium, 
including all the polypeptide products secreted or produced by 
the cells previously in the broth or medium. 

The step of contacting the recognition unit with the 
plurality of polypeptides may be effected in a number of ways. 
35 For example, one may contemplate itonobilizing the recognition 
unit on a solid support and bringing a solution of the 
plurality of polypeptides in contact with the immobilized 
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recognition unit. Such a procedure would be akin to an 
affinity chromatographic process, with the affinity matrix 
being comprised of the immobilized recognition unit. The 
polypeptides having a selective affinity for the recognition 
5 unit can then be purified by affinity selection. The nature 
of the solid support, process for attachment of the 
recognition unit to the solid support, solvent, and conditions 
of the affinity isolation or selection procedure would depend 
on the type of recognition unit in use but would be largely 

10 conventional and well known to those of ordinary skill in the 
art. Moreover, the valency of the recognition unit in the 
recognition unit complex used to screen the polypeptides is 
believed to affect the specificity of the screening step, and 
thus the valency can be chosen as appropriate in view of the 

15 desired specificity (see Sections 5.2 and 5.2.1). 

Alternatively, one may also separate the plurality 
of polypeptides into substantially separate fractions 
comprising individual polypeptides. For instance, one can 
separate the plurality of polypeptides by gel electrophoresis, 

20 column chromatography, or like method known to those of 
ordinary skill for the separation of polypeptides. The 
individual polypeptides can also be produced by a transformed 
host cell in such a way as to be expressed on or about its 
outer surface. Individual isolates can then be "probed" by 

25 the recognition unit, optionally in the presence of an inducer 
should one be required for expression, to determine if any 
selective affinity interaction takes place between the 
recognition unit and the individual clone. Prior to 
contacting the recognition unit with each fraction comprising 

30 individual polypeptides, the polypeptides can optionally first 
be transferred to a solid support for additional convenience. 
Such a solid support may simply be a piece of filter membrane, 
such as one made of nitrocellulose or nylon. 

In this manner, positive clones can be identified 

35 from a collection of transformed h6st cells of an expression 
library, which harbor a DNA construct encoding a polypeptide 
having a selective affinity for the recognition unit. The 
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polypeptide produced by the positive clone includes the 
functional domain of interest or a functional equivalent 
thereof. Furthermore, the amino acid sequence of the 
polypeptide having a selective affinity for the recognition 
5 unit can be determined directly by conventional means of amino 
acid sequencing, or the coding sequence of the DNA encoding 
the polypeptide can frequently be determined more conveniently 
by use of standard DNA sequencing methods. The primary 
sequence can then be deduced from the corresponding DNA 
XO sequence. 

If the amino acid sequence is to be determined from 
the polypeptide itself, one may use microsequencing 
techniques. The sequencing technique may include mass 
spectroscopy. 

15 In certain situations, it may be desirable to wash 

away any unbound recognition unit from a mixture of the 
recognition unit and the plurality of polypeptides prior to 
attempting to determine or to detect the presence of a 
selective affinity interaction (i.e., the presence of a 

20 recognition unit that remains bound after the washing step) 
Such a wash step may be particularly desirable when the 
plurality of polypeptides is bound to a solid support. 

AS can be anticipated, the degree of selective 
affinities observed varies widely, generally falling in the 

25 range of about 1 nm to about 1 mM. In preferred embodiments 
of the present invention, the selective affinity is on the 
order of about 10 nM to about 100 iM, more preferably on the 
order of about 100 nM to about 10 MM, and most preferably on 
the order of about 100 nM to about 1 tM. 

30 

5.2. specificit iY Qf Recognition Units 

A particular recognition unit may have fairly 
generic selectivity for a several members (e.g., three or four 
or more) of a "panel" of polypeptides having the domain of 
35 interest (or different versions of 'the domain of interest or 
functional equivalents of the domain of interest) or a fairly 
specific selectivity for only one or two, or possibly three, 
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of the polypeptides among a "panel" of same. Furthermore, 
multiple recognition units, each exhibiting a range of 
selectivities among a "panel" of polypeptides can be used to 
identify an increasingly comprehensive set of additional 
5 polypeptides that include the functional domain of interest. 

Hence, in a population of related polypeptides, the 
functional domains of interest of each member may be 
schematically represented by a circle. See, by way of 
example. Figure 7A. The circle of one polypeptide may overlap 

10 with that of another polypeptide. Such overlaps may be few or 
numerous for each polypeptide. A particular recognition unit, 
A, may recognize or interact with a portion of the circle of a 
given polypeptide which does not overlap with any other 
circle. Such a recognition unit would be fairly specific to 

15 that polypeptide. On the other hand, a second recognition 
unit, B, may recognize a region of overlap between two or more 
polypeptides. Such a recognition unit would consequently be 
less specific than the recognition unit A and may be 
characterized as having a more generic specificity depending 

20 on the number of polypeptides that it recognizes or interacts 
with. 

It should also be apparent to those of ordinary 
skill that any number of B-type recognition units (B, , B,, Bj, 
etc.) can be present, each recognizing different "panels" of 

25 polypeptides. Hence, the use of multiple recognition units 
provides an Increasingly more exhaustive population of 
polypeptides, each of which exhibits a variation or evolution 
in the functional domain of interest present in the initial 
target molecule. It should also be apparent to one that the 

30 present method can be applied in an iterative fashion, such 
that the identification of a particular polypeptide can lead 
to the choice of another recognition unit. See, e.g., Figure 
7B. Use of this new recognition unit will lead, in turn, to 
the identification of other polypeptides that contain 

35 functional domains of interest that enhance the phenotypic 
and/or genotypic diversity of the population of "related" 
polypeptides. 
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Hence, with a given recognition unit, one nay 
observe interaction with only one or two different 
polypeptides. With other recognition units, one may find 
three, four, or more selective interactions. In the situation 
5 in which only a single interaction is observed, it is likely, 
though not mandatory, that the selective affinity interaction 
is between the recognition unit and a replica of the initial 
target molecule (or a molecule very similar structurally and 
"functionally" to the initial target molecule) . 

10 

5.2.1. Effect of the Presentation of the Recognition 
Unit Complex on the Specificity of the 
Recoanition Unit-Functional Domain Interaction 

The present inventors have found, unexpectedly, that 

the valency (i.e., whether it is a monomer, dimer, tetramer, 

etc.) of the recognition unit that is used to screen an 

expression library or other soxirce of polypeptides apparently 

has a marked effect upon which genes or polypeptides are 

identified from the expression library or source of 

polypeptides. In particular, the specificity of the 

recognition unit-functional domain interaction appears to be 

affected by the valency of the recognition unit in the 

screening process. By this specificity is meant the 

selectivity in the functional domains to which the recognition 

unit will bind in the screening step. 

25 discussed above, in one embodiment, recognition 

units are obtained by screening a source of recognition units, 
e.g., a phage display library, for recognition units that bind 
to a particular target functional domain. Alternatively, 
database searches for recognition units with sequence homology 

^® to known recognition units can be employed. Of course, if a 
recognition unit for a particular target functional domain is 
already known, there is no need to screen a library or other 
source of recognition units; one can merely synthesize that 
particular recognition unit. The recognition unit, however 
obtained, is then used to screen an expression library or 
other source of polypeptides, to identify polypeptides that 
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the recognition unit binds to. A recognition unit that 
Identifies only Its target functional domain Is a recognition 
unit that Is completely specific. A recognition unit that 
Identifies one or two other polypeptides that do not contain 
5 Identically the target functional domain, from among a 

plurality of polypeptides (e.gr., of greater than 10*, lo*, or 
10^ complexity) , in addition to identifying a molecule 
comprising its target functional domain, is very or highly 
specific. A recognition unit that identifies most other 

10 polypeptides present that do not contain its target functional 
domain, in addition to identifying its target functional 
domain, is a non-specific recognition unit. In between very 
specific recognition units and non-specific recognition units, 
the present Inventors have discovered that there are 

15 recognition units that recognize a small number of molecules 
having functional domains other than their target functional 
domains. These recognition units are said to have generic 
specificity. 

Thus, there is a ••specificity continuum^', from 

20 completely and very specific through generic to non-specific, 
that a recognition unit may evince. See Figure 11 for a 
depiction of this specificity continuum. The Applicants have 
discovered that a major factor influencing the specificity 
exhibited by a recognition unit appears to be the valency of 

25 the recognition unit in the complex used to screen the 
expression library. 

Usually, high specificity is considered to be 
desirable when screening a library. High specificity is 
exhibited, e.g., by affinity purified polyclonal antlsera 

30 which, in general, are very specific. Monoclonal antibodies 
are also very specific. Small peptides in monovalent form, on 
the other hand, generally give very weak, non-specific signals 
when used to screen a library; thus, they are considered to be 
non-specific. 

35 The present Inventors have discovered that 

recognition units in the form of small peptides, in 
multivalent form, have a specificity midway between the high 
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specificity of antibodies and the low/non-specificity of 
monovalent peptides. Multivalency of the recognition unit of 
at least two, in a recognition unit complex used to screen the 
gene library, is preferred, with a multivalency of at least 
5 four more preferred, to obtain a screening wherein specificity 
is eased but not forfeited. In particular, a multivalent 
(believed to be tetravalent) recognition unit complex 
comprising streptavidin or avidin (preferably conjugated to a 
label, e.g., an enzyme such as alkaline phosphatase or 
10 horseradish peroxidase, or a fluorogen, e.g. green fluorescent 
protein) and biotinylated peptide recognition units have an 
unexpected generic specificity. This allows such peptides to 
be used to screen libraries to identify classes of 
polypeptides containing functional domains that are similar 
15 but not identical to the peptides' target functional domains. 
These classes of polypeptides are identified despite the low 
level of homology at the amino acid level of the functional 
domains of the members of the classes. 

In another specific embodiment, multivalent peptide 
20 recognition units may be in the form of multiple antigen 

peptides (MAP) (Tam, 1989, J. Imm. Meth. 124:53-61; Tarn, 1988, 
Proc. Natl. Acad. Sci. USA 85:5409-5413). In this form, the 
peptide recognition unit is synthesized on a branching lysyl 
matrix using solid-phase peptide synthesis methods. 
25 Recognition units in the form of MAP may be prepared by 

methods known in the art (Tarn, 1989, J. Imm. Meth. 124:53-61; 
Tam, 1988, Proc. Natl. Acad. Sci. USA 85:5409-5413), or, for 
example, by a stepwise solid-phase procedure on MAP resins 
(Applied Biosysteros), utilizing methodology established by the 
30 manufacturer. MAP peptides may be synthesized comprising 
(recognition unit peptide) jLys,, (recognition unit 
peptide)4Lys3, (recognition unit peptide) »Lys« or more levels of 
branching. 

The multivalent peptide recognition unit complexes 
35 may also be prepared by cross-linking the peptide to a carrier 
protein, e.g., bovine serum albumin (BSA) , keyhole limpet 
hemocyanin (KLH) , or an enzyme, by use of known cross-linking 
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reagents. Such cross-linked peptide recognition units nay be 
detected by, e.g., an antibody to the carrier protein or 
detection of the enzymatic activity of the carrier protein. 

Furthermore, the present inventors have discovered 
5 what specificity is exhibited by various types of recognition 
units and their complexes. I.e., where these recognition units 
and their complexes fall in the specificity continuxim. The 
present inventors have discovered a range of formats for 
presenting recognition units used to screen libraries. For 

10 example, the present inventors have determined that a peptide 
in the form of a bivalent fusion protein with alkaline 
phosphatase is very specific. The same peptide in the form of 
a fusion protein with the pIII protein of an M13 derived 
bacteriophage, expressed on the phage surface, has somewhat 

15 less, though still high, specificity. That same peptide when 
biotinylated in the form of a tetravalent streptavidin- 
alkaline phosphatase complex has generic specificity. Use of 
such a generically specific peptide permits the identification 
of a wide range of proteins from expression libraries or other 

20 sources of polypeptides, each protein containing an example of 
a particular functional domain. 

Accordingly, the present invention provides a method 
of modulating the specificity of a peptide such that the 
peptide can be used as a recognition unit to screen a 

25 plurality of polypeptides, thus identifying polypeptides that 
have a functional domain. In a specific embodiment, 
specificity is generic so as to provide for the identification 
of polypeptides having a functional domain that varies in 
sequence from that of the target functional domain known to 

30 bind the recognition unit under conditions of high 

specificity. In a particular embodiment, the method comprises 
forming a tetravalent complex of the biotinylated peptide and 
streptavidin-alkaline phosphatase prior to use for screening 
an expression library. 
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5. 3. Kits 

The present invention is also directed to an assay 
kit which can be useful in the screening of drug candidates. 
In a particular embodiment of the present invention, an assay 
5 kit is contemplated which comprises in one or more containers 
(a) a polypeptide containing a functional domain of interest; 
and (b) a recognition unit having a selective affinity for the 
polypeptide. The kit optionally further comprises a detection 
means for determining the presence of a polypeptide- 
10 recognition unit interaction or the absence thereof. 

In a specific embodiment, either the polypeptide 
containing the functional domain or the recognition unit is 
labeled. A wide range of labels can be used to advantage in 
the present invention, including but not limited to 
15 conjugating the recognition unit to biotin by conventional 
means. Alternatively, the label may comprise a fluorogen, an 
enzyme, an epitope, a chromogen, or a radionuclide. 
Preferably, the biotin is conjugated by covalent attachment to 
either the polypeptide or the recognition unit. The 
20 polypeptide or, preferably, the recognition unit is 

immobilized on a solid support. The detection means employed 
to detect the label will depend on the nature of the label and 
can be any known in the art, e.g., film to detect a 
radionuclide; an enzyme substrate that gives rise to a 
25 detectable signal to detect the presence of an enzyme; 
antibody to detect the presence of an epitope, etc. 

A further embodiment of the assay kit of the present 
invention includes the use of a plurality of polypeptides, 
each polypeptide containing a functional domain of interest. 
30 The assay kit further comprises at least one recognition unit 
having a selective affinity for each of the plurality of 
polypeptides and a detection means for determining the 
presence of a polypeptide-recognition unit interaction or the 

absence thereof. 
35 A kit is provided that comprises, in one or more 

containers, a first molecule comprising an SH3 domain and a 
second molecule that binds to the SH3 domain, i.e., a 
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recognition unit, where the SH3 domain is a novel SH3 domain 
identified by the methods of the present invention* 

In a specific embodiment, the present invention 
provides an assay kit comprising in one or more containers: 
5 (a) a purified polypeptide containing a functional 

domain of interest, in which the functional domain of is a 
domain selected from the group consisting of an SHI, SH2, SH3, 
PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, 
leucine zipper, and helix-turn-helix; and 
10 (b) a purified recognition unit having a selective 

binding affinity for said functional domain in said 
polypeptide. 

In the above assay kit, the polypeptide may comprise 

an amino acid sequence selected from the group consisting of 
15 SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 

192, 194, 196, 198, 200, 221, 113-115, 118-121, 125-128, 133- 

139, 204-218, and 219. 

In the above assay kit, the polypeptide may comprise 

an amino acid sequence selected from the group consisting of 
20 SEQ ID N0s:6, 14, 16, 26, 28, 34, 36, 112, 116, 117, 122-124, 

129-132, and 140. 

In other embodiments of the above-described assay 

kit, the recognition unit may be a peptide. The recognition 

unit may be labeled with e.g., an enzyme, an epitope, a 
25 chromogen, or biotin. 

In another specific embodiment, the present 

invention provides an assay kit comprising in containers: 
(a) a plurality of purified polypeptides, each 

polypeptide in a separate container and each polypeptide 
30 containing a functional domain of interest in which the 

functional domain of interest is a domain selected from the 

group consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 

Notch/ankyrin repeat, zinc fingers, leucine zippers, and 

helix-turn-helix; and 
35 (b) at least one recognition unit having a 

selective binding affinity for said functional domain in each 

of said plurality of polypeptides. 
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The present invention also provides an assay kit 
comprising in one or more containers: 

(a) a plurality of purified polypeptides, each 
polypeptide in a separate container and each polypeptide 
5 containing an SH3 domain; and 

.(b) at least one peptide having a selective 
affinity for the SH3 domain in each of said plurality of 
polypeptides. 

The present invention also provides a kit comprising 
10 a plurality of purified polypeptides comprising a fxinctional 
domain of interest, each polypeptide in a separate container, 
and each polypeptide having a functional domain of a different 
sequence but capable of displaying the same binding 
specificity. 

15 In the above-described kits, the polypeptides may 

have an amino acid sequence selected from the group consisting 
of: SEQ ID N0s:8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 
192, 194, 196, 198, 200, 221. 

In the above-described kits, the functional domain 

20 may be an SH3 domain. 

The molecular components of the kits are preferably 

purified. 

The kits of the present invention may be used in the 
methods for identifying new drug candidates and determining 
25 the specificities thereof that are described in Section 5.4. 

5.4. Assays for the Identification of Potential Drug 

r;^ nd|dates and Determi ning the Specificity Thereof 

The present invention also provides methods for 
30 identifying potential drug candidates (and lead compounds) and 
determining the specificities thereof. For example, knowing 
that a polypeptide with a functional domain of interest and a 
recognition unit, e.g., a binding peptide, exhibit a selective 
affinity for each other, one may attempt to identify a drug 
35 that can exert an effect on the polypeptide-recognition unit 
interaction, e.g., either as an agonist or as an antagonist 
(inhibitor) of the interaction. With this assay, one can 
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screen a collection of candidate "drugs" for the one 
exhibiting the most desired characteristic, e.g., the most 
efficacious in disrupting the interaction or in competing with 
the recognition unit for binding to the polypeptide. 
5 Alternatively, one may utilize the different 

selectivities that a particular recognition unit may exhibit 
for different polypeptides bearing the same, similar, or 
functionally equivalent functional domains. Thus, one may 
tailor the screen to identify drug candidates that exhibit 

10 more selective activities directed to specific polypeptide- 
recognition unit interactions, among the "panel" of 
possibilities. Thus, for example, a drug candidate may be 
screened to identify the presence or absence of an effect on 
particular binding interactions, potentially leading to 

15 undesirable side effects. 

Indeed, an intriguing application of the present 
invention is described as follows. A known antiviral agent, 
FIAU (a halogenated nucleoside analog) , is effective at given 
dosages against the virus that causes hepatitis B. This 

20 compound is suspected of causing toxic side effects, however, 
which give rise to liver failure in certain patients to whom 
the drug is administered. According to the present invention, 
an assay is provided which can be used to develop a new 
generation of FIAU-derived drug that maintains its 

25 effectiveness against viral replication while reducing liver 
toxicity. Such an assay is provided by choosing FIAU as a 
recognition unit having a selective affinity for a polypeptide 
present in the hepatitis B virus or a cell infected with the 
virus. This polypeptide or family of polypeptides having the 

30 functional domain of interest is obtained by allowing the 
chosen recognition unit, FIAU, to come into contact with an 
expression library comprised of the hepatitis B virus genome 
and/or a cDNA expression library of infected cells, according 
to the methods of the present invention. 

35 Likewise, the chosen recognition unit is allowed to 

come into contact with a plurality of polypeptides obtained 
from a sample of a human liver extract or of noninfected 
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hepatocytes. In this manner, a "panel" of polypeptides each 
of which exhibits a selective affinity for the chosen 
recognition unit is identified. As described above, this 
panel is used to determine the activities of drug (FIAU) 
S homologs, analogs, or derivatives in terms of, say, selective 
inhibition of viral polypeptide-PIAU interaction versus liver 
polypeptide-FIAU interaction. Hence, those drug homologs, 
analogs, or derivatives that maintain a selective affinity for 
the viral polypeptide (or infected cell polypeptide) while 
10 failing to interact with or having a minimal binding affinity 
for liver polypeptides (and, hence, have reduced toxicity in 
the liver due to elimination of undesirable molecular 
interactions) can be identified and selected. Additional 
iterations of this process can be performed if so desired. 
15 Therefore, the present invention contemplates an 

assay for screening a drug candidate comprising: (a) allowing 
at least one polypeptide comprising a functional domain of 
interest to come into contact with at least one recognition 
unit having a selective affinity for the polypeptide in the 
20 presence of an amount of a drug candidate, such that the 
polypeptide and the recognition unit are capable of 
interacting when brought into contact with one another in the 
absence of said drug candidate, and in which the functional 
domain of interest is a domain selected from the group 
25 consisting of an SHl, SH2, SH3, PH, PTB, LIM, armadillo, 

Notch/ankyrin repeat, zinc finger, leucine zipper, and helix- 
turn-helix; and (b) determining the effect, if any, of the 
presence of the amount of the drug candidate on the 
interaction of the polypeptide with the recognition unit. 
30 In one embodiment, the effect of the drug candidate 

upon multiple, different interacting polypeptide-recognition 
unit pairs is determined in which at least some of said 
polypeptides have a functional domain that differs in sequence 
but is capable of displaying the same binding specificity as 
35 the functional domain in another of said polypeptides. 

In another embodiment, at least one of said at least 
one polypeptide or recognition unit contains a consensus 
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functional domain and consensus recognition unit, 
respectively. 

In another embodiment, the drug candidate is an 
inhibitor of the polypeptide-recognition unit interaction that 
5 is identified by detecting a decrease in the binding of 
polypeptide to recognition unit in the presence of such 
inhibitor. 

In another embodiment, said polypeptide is a 
polypeptide containing an SH3 domain produced by a method 
10 comprising: 

(i) screening a peptide library with an SH3 domain 
to obtain one or more peptides that bind the SH3 domain; 

(ii) using one of the peptides from step (i) to 
screen a source of polypeptides to identify one or more 

15 polypeptides containing an SH3 domain; 

(iii) determining the amino acid sequence of the 
polypeptides identified in step (ii) ; and 

(iv) producing the one or more novel polypeptides 
containing an SH3 domain. 

20 In another embodiment, said polypeptide is a 

polypeptide containing an SH3 domain produced by a method 
comprising: 

(i) screening a peptide library with an SH3 domain 
to obtain a plurality of peptides that bind the SH3 domain; 
25 (ii) determining a consensus sequence for the 

peptides obtained in step (i) ; 

(iii) producing a peptide comprising the consensus 

sequence ; 

(iv) using the peptide comprising the consensus 

30 sequence to screen a source of polypeptides to identify one or 
more polypeptides containing an SH3 domain; 

(V) determining the amino acid sequence of the 
polypeptides identified in step (iv) ; and 

(vi) producing the one or more polypeptides 
35 containing an SH3 domain. 

In a preferred embodiment, the effect of the drug 
candidate upon multiple, different interacting polypeptide- 
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recognition unit pairs is determined in which preferably at 
least some (e.g., at least 2, 3, 4, 5, 7, or 10) of said 
polypeptides have functional domains that vary in sequence yet 
are capable of displaying the same binding specificity, i.e., 

5 binding to the same recognition unit. In another specific 
embodiment, at least one of said polypeptides and/or 
recognition units contain a consensus functional domain and 
recognition unit, respectively (and thus are not known to be 
naturally expressed proteins). In one embodiment, the 
10 polypeptide is a novel polypeptide identified by the methods 
of the present invention. In a specific embodiment, an 
inhibitor of the polypeptide-recognition unit interaction is 
identified by detecting a decrease in the binding of 
polypeptide to recognition unit in the presence of such 

15 inhibitor. 

A common problem in the development of new drugs is 
that of identifying a single, or a small number, of compounds 
that possess a desirable characteristic from among a 
background of a large number of compounds that lack that 
20 desired characteristic. This problem arises both in the 
testing of compounds that are natural products from plant, 
animal, or microbial sources and in the testing of man-made 
compounds. Typically, hundreds, or even thousands, of 
compounds are randomly screened by the use of in vitro assays 
25 such as those that monitor the compound's effect on some 
enzymatic activity, its ability to bind to a reference 
substance such as a receptor or other protein, or its ability 
to disrupt the binding between a receptor and its ligand. 

The compounds which pass this original screening 
30 test are known as "lead" compounds. These lead compounds are 
then put through further testing, including, eventually, in 
vivo testing in animals and humans, from which the promise 
shown by the lead compounds in the original in vitro tests is 
either confirmed or refuted. See Re^^nqtop's pharmaceutj,caX 
35 gcignces, 1990, A.R. Gennaro, ed.. Chapter 8, pages 60-62, 
Mack publishing Co., Easton, PA; Ecker and Crooke, 1995, 
Bio/Technology 13:351-360. 
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There is a continual need for new compounds to be 
tested in the in vitro assays that make up the first testing 
step described above. There is also a continual need for new 
assays by which the pharmacological activities of these 
5 compounds may be tested. It is an object of the present 
invention to provide such new assays to determine whether a 
candidate compound is capable of affecting the binding between 
a polypeptide containing a functional domain and a recognition 
unit that binds to that functional domain. In particular, it 

10 is an object of the present invention to provide polypeptides, 
particularly novel ones, containing functional domains and 
their corresponding recognition units for use in the above- 
described assays. The use of these polypeptides greatly 
expands the number of assays that may be used to screen 

15 potential drug candidates for useful pharmacological 

activities (as well as to identify potential drug candidates 
that display adverse or undesirable pharmacological 
activities) • In one particular embodiment of the present 
invention, the polypeptides contain an SH3 domain. 

20 In one embodiment of the present invention, such 

polypeptides are identified by a method comprising: using a 
recognition unit that is capable of binding to a predetermined 
functional domain to screen a source of polypeptides, thus 
identifying novel polypeptides containing the functional 

25 domain or a similar functional domain. 

In a particular embodiment of the above-described 
method, the novel polypeptide comprises an SH3 domain and is 
obtained by: 

(i) screening a peptide library with the SH3 domain 
30 to obtain one or more peptides that bind the SH3 domain; 

(ii) using one of the peptides from step (i), 
preferably in the form of a multivalent complex, to screen a 
source of polypeptides to identify one or more novel 
polypeptides containing SH3 domains; 

35 (iii) determining the amino acid sequence of the 

polypeptides identified in step (ii); and 
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(iv) producing the one or more novel polypeptides 
containing SH3 domains. 

In another embodiment of the above-described method, 
the novel polypeptide containing an SH3 domain is obtained by: 
5 (i) screening a peptide library with the SH3 domain 

to obtain peptides that bind the SH3 domain; 

(ii) determining a consensus sequence for the 
peptides obtained in step (i) ; 

(iii) producing a peptide comprising the consensus 

10 sequence; 

(iv) using the peptide comprising the consensus 
sequence to screen a source of polypeptides to identify one or 
more novel polypeptides containing SH3 domains; 

(V) determining the amino acid sequence of the novel 

15 polypeptides identified in step (iv); and 

(vi) producing the one or more novel polypeptides 
containing SH3 domains. 

One of ordinary skill in the art will recognize that 
it will not always be necessary to utilize the entire novel 

20 polypeptide containing the SH3 domain in the assays described 
herein. Often, a portion of the polypeptide that contains the 
SH3 domain will be sufficient, e.g., a glutathione S- 
transf erase (GST)-SH3 domain fusion protein. See Figure lOA 
and lOB for a depiction of the portions of the exemplary novel 

25 polypeptides that contain SH3 domains. 

A typical assay of the present invention consists of 
at least the following components: (1) a molecule (e.g., 
protein or polypeptide) comprising a functional domain; (2) a 
recognition unit that selectively binds to the functional 

30 domain; (3) a candidate compound, suspected of having the 
capacity to affect the binding between the protein containing 
the functional domain and the recognition unit. The assay 
components may further comprise (4) a means of detecting the 
binding of the protein comprising the functional domain and 

35 the recognition unit. Such means Can be e.g., a detectable 
label affixed to the protein comprising the functional domain, 
the recognition unit, or the candidate compound. 
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In a specific embodiment, the protein comprising the 
functional domain is a novel protein discovered by the methods 
of the present invention. 

In another specific embodiment, the invention 
5 provides a method of identifying a compound that affects the 
binding of a molecule comprising a functional domain and a 
recognition iinit that selectively binds to the functional 
domain comprising: 

(a) contacting the molecule comprising the 

10 functional domain and the recognition unit under conditions 
conducive to binding in the presence of a candidate compound 
and measuring the amount of binding between the molecule and 
the recognition unit; 

(b) comparing the amount of binding in step (a) with 
IS the amount of binding known or determined to occur between the 

molecule and the recognition unit in the absence of the 
candidate compound, where a difference in the amount of 
binding between step (a) and the amount of binding known or 
determined to occur between the molecule and the recognition 

20 unit in the absence of the candidate compound indicates that 
the candidate compound is a compound that affects the binding 
of the molecule comprising a functional domain and the 
recognition unit* In a specific embodiment, the molecule 
comprising the functional domain is a novel protein discovered 

25 by the methods of the present invention. In another specific 
embodiment, the functional domain is an SH3 domain. 

In one embodiment, the assay comprises allowing the 
polypeptide containing an SH3 domain to contact a recognition 
unit that selectively binds to the SH3 domain in the presence 

30 and in the absence of the candidate compound under conditions 
such that binding of the recognition unit to the protein 
containing an SH3 domain will occur unless that binding is 
disrupted or prevented by the candidate compound. By 
detecting the amount of binding of the recognition unit to the 

35 protein containing an SH3 domain in the presence of the 

candidate compound and comparing that amount of binding to the 
amount of binding of the recognition unit to the protein or 
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polypeptide containing an SH3 domain in the absence of the 
candidate compound, it is possible to determine whether the 
candidate compound affects the binding and thus is a useful 
lead compound for the modulation of the activity of proteins 
5 containing the SH3 domain. The effect of the candidate 
compound may be to either increase or decrease the binding. 

One version of an assay suitable for use in the 
present invention comprises binding the protein containing an 
SH3 domain to a solid support such as the wells of a 

10 microtiter plate. The wells contain a suitable buffer and 
other substances to ensure that conditions in the wells permit 
the binding of the protein or polypeptide containing an SH3 
domain to its recognition unit. The recognition unit and a 
candidate compound are then added to the wells. The 

15 recognition unit is preferably labeled, e.g., it might be 

biotinylated or labeled with a radioactive moiety, or it might 
be linked to an enzyme, e.g., alkaline phosphatase. After a 
suitable period of incubation, the wells are washed to remove 
any unbound recognition unit and compound. If the candidate 

20 compound does not interfere with the binding of the protein or 
polypeptide containing an SH3 domain to the labeled 
recognition unit, the labeled recognition unit will bind to 
the protein or polypeptide containing an SH3 domain in the 
well. This binding can then be detected. If the candidate 

25 compound interferes with the binding of the protein or 
polypeptide containing an SH3 domain and the labeled 
recognition unit, label will not be present in the wells, or 
will be present to a lesser degree than is the case when 
compared to control wells that contain the protein or 

30 polypeptide containing an SH3 domain and the labeled 

recognition unit but to which no candidate compound is added. 
Of course, it is possible that the presence of the candidate 
compound will increase the binding between the protein or 
polypeptide containing an SH3 domain and the labeled 

35 recognition unit. Alternatively, the recognition unit can be 
affixed to a solid substrate during the assay. Functional 
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domains other than SH3 domains and their corresponding 
recognition units can also be used. 

In a specific embodiment of the above*described 
method, the protein or polypeptide containing an SH3 domain is 
5 a novel protein or polypeptide containing an SH3 domain that 
has been identified by the methods of the present invention. 



5.5. Use of Polypeptides Containing Functional 

Domains to Discover Polypeptides Involved in 
Pharmacological Activities 

Using the methods of the present invention, it is 

possible to identify and isolate large numbers of polypeptides 

containing functional domains, e.g., SH3 domains. Using these 

polypeptides, one can construct a matrix relating the 

polypeptides to an array of candidate drug compounds. For 

example, Table 1 shows such a matrix. 



TABLE 1 



20 



2S 



30 



35 



ABCDEF6HIJ 

1 

2 X X X 

3 

4 

5 X 
6 

7 X X 

8 

9 X 
10 

In Table 1, the columns headed by letters at the top 
of the table represent different polypeptides containing SH3 
domains (preferably novel polypeptides identified by the 
methods of the invention) . The rows numbered along the left 
side of the table represent recognition units with various 
specificity to SH3 domains. For each candidate drug compound, 
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a table such as Table 1 is generated from the results of 
binding assays. An X placed at the intersection of a 
particular numbered row and lettered column represents a 
positive assay for binding, i.e., the candidate drug compound 
5 affected the binding of the recognition unit of that 

particular row to the SH3 domain of that particular column. 

Such data as that illustrated above is used to 
determine whether candidate drug compounds display or are at 
risk of displaying desirable or undesirable physiological or 
10 pharmacological activities. For example, in Table l, the drug 
compound inhibits the binding of recognition unit 2 to the SH3 
domains of polypeptides B, D, and H; the compound inhibits the 
binding of recognition unit 5 to the SH3 domain of polypeptide 
F; the compound inhibits the binding of recognition unit 7 to 
15 the SH3 domains of polypeptides C and H; and the compound 
inhibits the binding of recognition unit 9 to the SH3 domain 
of polypeptide A. 

If interaction with polypeptide H leads to the 
desirable physiological or pharmacological activity, then this 
20 drug candidate might be a good lead. However, interaction 
with polypeptides A, B, C, D, and F would need to be 
evalutated for potential side effects. 

As the maps are generated and pharmacological 
effects observed, the maps will allow strategic assessment of 
2S the specificity necessary to obtain the desired 

pharmacological effect. For example, if compounds 2 and 7 are 
able to affect some pharmacological activity, while compounds 
5 and 9 do not affect that activity, then polypeptide H is 
likely to be involved in that pharmacological activity. For 
30 example, if compounds 2 and 7 were both able to inhibit mast 
cell degranulation, while compounds 5 and 9 did not, it is 
likely that polypeptide H is involved in mast cell 

degranulation. 

Accordingly, the present invention provides a method 
35 of utilizing the polypeptides comprising functional domains of 
the present invention in an assay to determine the 
participation of those polypeptides in pharmacological 
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activities. In a particular embodiment, the polypeptides 
comprise SH3 domains. 

In another embodiment, the method comprises: 

(a) contacting a drug candidate with a molecule 

5 comprising a functional domain under conditions conducive to 
binding, and detecting or measuring any specific binding that 
occurs; and 

(b) repeating step (a) with a plurality of different 
molecules, each comprising a different functional domain but 

10 capable of binding to a single predetermined recognition unit 

under appropriate conditions* 

Preferably, at least one of said molecules is a 

novel polypeptide identified by the methods of the present 

invention. In a specific embodiment, the molecules comprise 
15 the SH3 domains of Src, Abl, Cortactin, Phospho lipase C7, Nek, 

Crk, p53bp2, Amphiphysin, Grb2, RasGap, or Phosphatidyl* 

inositol 3* kinase. 

The present invention also provides a method of 

determining the potential pharmacological activities of a 
20 molecule comprising: 

(a) contacting the molecule with a compound 
comprising a functional domain under conditions conducive to 
binding; 

(b) detecting or measuring any specific binding that 
25 occurs; and 

(c) repeating steps (a) and (b) with a plurality of 
different compounds, each compound comprising a functional 
domain of different sequence but capable of displaying the 
same binding specificity. 

30 In a specific embodiment the functional domain is an 

SH3 domain. 

In another embodiment, the compounds comprise the 
SH3 domains of Src, Abl, Cortactin, Phospholipase C7, Nek, 
Crk, p53bp2, Amphiphysin, Grb2, RasGap, or Phosphatidyl- 
35 inositol 3* kinase. 

The present invention also provides a method of 
identifying a compound that affects the binding of a molecule 
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comprising a functional domain to a recognition unit that 
selectively binds to the functional domain comprising: 

(a) contacting the molecule comprising the 
functional domain and the recognition unit under conditions 

5 conducive to binding in the presence of a candidate compound 
and measuring the amount of binding between the molecule and 
the recognition unit and in which the functional domain of 
interest is a domain selected from the group consisting of an 
SHI, SH2, SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, 
10 zinc finger, leucine zipper, and helix-turn-helix; 

(b) comparing the amount of binding in step (a) with 
the amount of binding known or determined to occur between the 
molecule and the recognition unit in the absence of the 
candidate compound, where a difference in the amount of 

15 binding between step (a) and the amount of binding known or 
determined to occur between the molecule and the recognition 
unit in the absence of the candidate compound indicates that 
the candidate compound is a compound that affects the binding 
of the molecule comprising a functional domain and the 

20 recognition unit. 

In a specific embodiment, the functional domain is 

an SH3 domain. 

5.6. use of Mo r e Than One Recognition Unit SimultaneQUgly 
25 It has been found that when screening a source of 

polypeptides with a recognition unit, it is possible to use 
more than one recognition unit at the same time. In 
particular, it has been found that as many as five different 
recognition units may be used simultaneously to screen a 
30 source of polypeptides. 

In particular, when the recognition units are 
biotinylated peptides and the source of polypeptides is a cDNA 
expression library, the steps of preconjugation of the 
biotinylated peptides to streptavidin-alkaline phosphatase as 
35 well as the steps involved in screening the cDNA expression 
library may be carried out in essentially the same manner as 
is done when a single biotinylated peptide is used as a 
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recognition unit. See Section 6.1 for details. The key 
difference when using more than one biotinylated peptide at a 
time is that the peptides are combined either before or at the 
step where they are placed in contact with the polypeptides 
5 from which selection occurs. 

In an embodiment employing a bacteriophage 
expression library to express the polypeptides, when the 
positive clones are worked up to the level of isolated 
plaques, the clonal bacteriophage from the isolated plaques 
10 may be tested against each of the biotinylated peptides 
individually, in order to determine to which of the several 
peptides that were used as recognition units in the primary 
screen the phage are actually binding. 

5.7. Use of Recognition Units from 

Known Amino Acid Sequences 

In many cases it may not be necessary to screen a 

collection of substances, e.g., a peptide library, in order to 

obtain a recognition unit for a given functional domain. In 

20 the case of peptide recognition units, for example, it is 
sometimes possible to identify a recognition unit by 
inspection of known amino acid sequences. Stretches of these 
amino acid sequences that resemble known binding sequences for 
the functional domain can be synthesized and screened against 

25 a source of polypeptides in order to obtain a plurality of 
polypeptides comprising the given functional domain. 

Prior to the disclosure of the present invention of 
methods of preparing recognition units having generic 
specificity, it would have been thought fruitless to pursue 

30 this approach. The expectation would have been that a 

recognition unit, chosen from published amino acid sequences 
as described above, would have been useful, at best, to 
identify a single protein containing a functional domain. 
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5,8. Isolation and Expression of Nucleic Acids Encoding 

Polypepti des Comprising a Functional Domain 

In particular aspects, the invention provides amino 

acid sequences of polypeptides comprising functional domains, 

g preferably human polypeptides, and fragments and derivatives 

thereof which comprise an antigenic determinant (i.e., can be 

recognized by an antibody) or which are functionally active, 

as well as nucleic acid sequences encoding the foregoing. 

"Functionally active" material as used herein refers to that 

material displaying one or more functional activities, e.g., a 

biological activity, antigenicity (capable of binding to an 

antibody) immunogenic ity, or comprising a functional domain 

that is capable of specific binding to a recognition unit. 

In specific embodiments, the invention provides fragments of 

polypeptides comprising a functional domain consisting of at 

least 4 0 amino acids, or of at least 75 amino acids. Nucleic 

acids encoding the foregoing are provided. Functional 

fragments of at least 10 or 20 amino acids are also provided. 

In other specific embodiments, the invention 

provides nucleotide sequences and subsequences encoding 

polypeptides comprising a functional domain, preferably human 

polypeptides, consisting of at least 25 nucleotides, at least 

50 nucleotides, or at least 150 nucleotides. Nucleic acids 

encoding fragments of the polypeptides comprising a functional 

domain are provided, as well as nucleic acids complementary to 

and capable of hybridizing to such nucleic acids. In one 

embodiment, such a complementary sequence may be complementary 

to a cDNA sequence encoding a polypeptide comprising a 

functional domain of at least 25 nucleotides, or of at least 

100 nucleotides. In a preferred aspect, the invention 

utilizes cDNA sequences encoding htiman polypeptides comprising 

a functional domain or a portion thereof. 

Any eukaryotic cell can potentially serve as the 

nucleic acid source for the molecular cloning of polypeptides 

comprising a functional domain. The DNA may be obtained by 

standard procedures known in the art (e.g., a DNA "library") 

by CDNA cloning, or by the cloning of genomic DNA, or 
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fragments thereof, purified from the desired cell (see, for 
example Sambrook et al., 1989, Molecular Cloning, A Laboratory 
Manual, Cold Spring Harbor Xjaboratory, 2d. Ed., Cold Spring 
Harbor, New York; Glover, D.M. (ed.) , 1985, DNA Cloning: A 
S Practical Approach, MRL Press, Ltd., Oxford, U.K. Vol. I, II.) 
Clones derived from genomic DNA may contain regulatory and 
intron DNA regions in addition to coding regions; clones 
derived from cDNA will contain only exon sequences. Whatever 
the source, the gene encoding a polypeptide comprising a 

10 functional domain should be molecularly cloned into a suitable 
vector for propagation of the gene. 

In the molecular cloning of the gene from genomic 
DNA, DNA fragments are generated, some of which will encode 
the desired gene. The DNA may be cleaved at specific sites 

15 using various restriction enzymes. Alternatively, one may use 
DNAse in the presence of manganese to fragment the DNA, or the 
DNA can be physically sheared, as for example, by sonication. 
The linear DNA fragments can then be separated according to 
size by standard techniques, including but not limited to, 

2 0 agarose and polyacrylamide gel electrophoresis and column 
chromatography . 

Once a gene encoding a particular polypeptide 
comprising a functional domain has been isolated from a first 
species, it is a routine matter to isolate the corresponding 

25 gene from another species, identification of the specific DNA 
fragment from another species containing the desired gene may 
be accomplished in a number of ways. For example, if an 
amount of a portion of a gene or its specific RNA from the 
first species, or a fragment thereof e.g., the functional 

30 domain, is available and can be purified and labeled, the 
generated DNA fragments from another species may be screened 
by nucleic acid hybridization to the labeled probe (Benton, W. 
and Davis, R. , 1977, Science 196, 180; Grunstein, M. And 
Hogness, D., 1975, Proc- Natl. Acad. Sci. U«S.A. 72, 3961). 

35 Those DNA fragments with substantial homology to the probe 
will hybridize. In a preferred embodiment, PGR using primers 
that hybridize to a known sequence of a gene of one species 
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can be used to amplify the homolog of such gene in a different 
species. The amplified fragment can then be isolated and 
inserted into an expression or cloning vector. It is also 
possible to identify the appropriate fragment by restriction 
5 enzyme digestion (s) and comparison of fragment sizes with 
those expected according to a known restriction map if such is 
available. Further selection can be carried out on the basis 
of the properties of the gene. Alternatively, the presence of 
the gene may be detected by assays based on the physical, 
10 chemical, or immunological properties of its expressed 
product. For example, cDNA clones, or DNA clones which 
hybrid-select the proper mRNAs, can be selected which produce 
a protein that, e.g., has similar or identical electrophoretic 
migration, isolectric focusing behavior, proteolytic digestion 
15 maps, in vitro aggregation activity ("adhesiveness") or 

antigenic properties as known for the particular polypeptide 
comprising a functional domain from the first species. If an 
antibody to that particular polypeptide is available, 
corresponding polypeptide from another species may be 
20 identified by binding of labeled antibody to the putatively 
polypeptide synthesizing clones, in an ELISA (enzyme-linked 
immunosorbent assay) -type procedure. 

Genes encoding polypeptides comprising a functional 
domain can also be identified by mRNA selection by nucleic 
25 acid hybridization followed by in vitro translation. In this 
procedure, fragments are used to isolate complementary mRNAs 
by hybridization. Such DNA fragments may represent available, 
purified DNA of genes encoding polypeptides comprising a 
functional domain of a first species, Immunoprecipitation 
30 analysis or functional assays (e.g., ability to bind to a 

recognition unit) of the in vitro translation products of the 
isolated mRNAs identifies the mRNA and, therefore, the 
complementary DNA fragments that contain the desired 
sequences. In addition, specific mRNAs may be selected by 
35 adsorption of polysomes isolated from cells to immobilized 
antibodies specifically directed against polypeptides 
comprising a functional domain. A radiolabelled cDNA of a 
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gene encoding a polypeptide comprising a functional domain can 
be synthesized using the selected mRNA (from the adsorbed 
polysomes) as a template. The radiolabelled mRNA or cDNA may 
then be used as a probe to identify the DNA fragments that 
5 represent the gene encoding the polypeptide comprising a 
functional domain of another species from among other genomic 
DNA fragments. In a specific embodiment, human homologs of 
mouse genes are obtained by methods described above. In 
various embodiments, the human homolog is hybridizable to the 

10 mouse homolog under conditions of low, moderate, or high 

stringency. By way of example and not limitation, procedures 
using such conditions of low stringency are as follows (see 
also Shilo and Weinberg, 1981, Proc. Natl. Acad. Sci. USA 
78:6789-6792): Filters containing DNA are pretreated for 6 h 

15 at 40 «C in a solution containing 35% formamide, 5X BSC, 50 mH 
Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, 
and 500 /ig/ml denatured salmon sperm DNA. Hybridizations are 
carried out in the same solution with the following 
modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 /xg/ml 

20 salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20 X 10^ 
cpm ^^P-labeled probe is used. Filters are incubated in 
hybridization mixture for 18-20 h at 40 •C, and then washed for 
1.5 h at 55*C in a solution containing 2X SSC, 25 mM Tris-HCl 
(pH 7.4), 5 mH EDTA, and 0.1% SDS. The wash solution is 

25 replaced with fresh solution and Incubated an additional 1.5 h 
at 60^C. Filters are blotted dry and exposed for 
autoradiography. If necessary, filters are washed for a third 
time at 65-68«C and reexposed to film. Other conditions of 
low stringency which may be used are well known in the art 

30 (e.g., as employed for cross-species hybridizations). 

By way of example and not limitation, procedures 
using conditions of high stringency are as follows: 
Prehybridization of filters containing DNA is carried out for 
8 h to overnight at 65*>C in buffer composed of 6X SSC, 50 mM 

35 Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% 
BSA, and 500 M9/nl denatured salmon sperm DNA. Filters are 
hybridized for 48 h at 65<'C in prehybridization mixture 
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containing 100 itg/nl denatured salmon sperm DNA and 5-20 X 10* 
cpm of '^P-labeled probe. Washing of filters is done at 37 'C 
for 1 h in a solution containing 2X SSC, 0.01% PVP, 0.01% 
Ficoll, and 0.01% BSA. This is followed by a wash in O.IX SSC 
5 at 50 for 45 min before autoradiography. Other conditions 
of high stringency which may be used are well known in the 
art. 

The identified and isolated gene encoding a 
polypeptide comprising a functional domain can then be 
10 inserted into an appropriate cloning vector. A large number 
of vector-host systems known in the art may be used. Possible 
vectors include, but are not limited to, plasmids or modified 
viruses, but the vector system must be compatible with the 
host cell used. Such vectors include, but are not limited to, 
15 bacteriophages such as lambda derivatives, or plasmids such as 
PBR322 or pUC plasmid derivatives. The insertion into a 
cloning vector can, for example, be accomplished by ligating 
the DNA fragment into a cloning vector which has complementary 
cohesive termini. However, if the complementary restriction 
20 sites used to fragment the DNA are not present in the cloning 
vector, the ends of the DNA molecules may be enzymatically 
modified. Alternatively, any site desired may be produced by 
ligating nucleotide sequences (linkers) onto the DNA termini; 
these ligated linkers may comprise specific chemically 
25 synthesized oligonucleotides encoding restriction endonuclease 
recognition sequences. In an alternative method, the cleaved 
vector and gene may be modified by homopolymeric tailing. 
Recombinant molecules can be introduced into host cells via 
transformation, transf action, infection, electroporation, 
30 etc., so that many copies of the gene sequence are generated. 

in an alternative method, the desired gene may be 
identified and isolated after insertion into a suitable 
cloning vector in a "shot gun" approach. Enrichment for the 
desired gene, for example, by size fractionization, can be 
35 done before insertion into the clohing vector. 

In specific embodiments, transformation of host 
cells with recombinant DNA molecules that incorporate the 
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isolated gene, cDNA, or synthesized DNA sequence enables 
generation of multiple copies of the gene. Thus, the gene may 
be obtained in large quantities by growing trans formants, 
isolating the recombinant DNA molecules from the transformants 
5 and, when necessary, retrieving the inserted gene from the 
isolated recombinant DNA. 

The nucleic acid coding for a polypeptide comprising 
a functional domain of the invention can be inserted into an 
appropriate expression vector, i.e., a vector which contains 

10 the necessary elements for the transcription and translation 
of the inserted protein-coding sequence. The necessary 
transcriptional and translational signals can also be supplied 
by the native gene encoding the polypeptide and/ or its 
flanking regions. A variety of host- vector systems may be 

15 utilized to express the protein-coding sequence. These 

include but are not limited to mammalian cell systems infected 
with virus (e.g., vaccinia virus, adenovirus, etc.); insect 
cell systems infected with virus (e.g., baculovirus) ; 
microorganisms such as yeast containing yeast vectors, or 

20 bacteria transformed with bacteriophage, DNA, plasmid DNA, or 
cosmid DNA. The expression elements of vectors vary in their 
strengths and specificities. Depending on the host-vector 
system utilized, any one of a number of suitable transcription 
and translation elements may be used. 

25 Any of the methods previously described for the 

insertion of DNA fragments into a vector may be used to 
construct expression vectors containing a chimeric gene 
consisting of appropriate transcriptional/translational 
control signals and the protein coding sequences. These 

30 methods may include in vitro recombinant DNA and synthetic 
technicpies and in vivo recombinants (genetic recombination) . 
Expression of nucleic acid sequence encoding a protein or 
peptide fragment may be regulated by a second nucleic acid 
sequence so that the protein or peptide is expressed in a host 

35 transformed with the recombinant DNA molecule. For example, 
expression of a protein may be controlled by any 
promoter/ enhancer element known in the art. Promoters which 
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may be used to control gene expression include, but are not 
limited to, the SV40 early promoter region (Benoist and 
Chambon, 1981, Nature 290, 304-310), the promoter contained in 
the 3« long terminal repeat of Rous sarcoma virus (Yamamoto, 
5 et al., 1980, Cell 22, 787-797), the herpes thymidine kinase 
promoter (Wagner et al., 1981, Proc. Natl. Acad. Sci. U.S.A. 
78, 1441-1445), the regulatory sequences of the 
metal lothionein gene (Brinster et al., 1982, Nature 296, 39- 
42); prokaryotic expression vectors such as the /3-lactamase 
10 promoter (Villa-Kamarof f , et al., 1978, Proc. Natl. Acad. Sci. 
U.S.A. 75, 3727-3731), or the tas promoter (DeBoer, et al., 
1983, Proc. Natl. Acad. Sci. U.S.A. 80, 21-25); see also 
••Useful proteins from recombinant bacteria^^ in Scientific 
American, 1980, 242, 74-94; plant expression vectors 
15 comprising the nopaline synthetase promoter region (Herrera- 
Estrella et al., Nature 303, 209-213) or the cauliflower 
mosaic virus 35S RNA promoter (Gardner, et al., 1981, Nucl. 
Acids Res. 9, 2871), and the promoter of the photosynthetic 
enzyme ribulose biphosphate carboxylase (Herrera-Estrella et 
20 al., 1984, Nature 310, 115-120); promoter elements from yeast 
or other fungi such as the Gal 4 promoter, the ADC (alcohol 
dehydrogenase) promoter, PGK (phosphoglycerol kinase) 
promoter, alkaline phosphatase promoter, and the following 
animal transcriptional control regions, which exhibit tissue 
25 specificity and have been utilized in transgenic animals: 
elastase I gene control region which is active in pancreatic 
acinar cells (Swift et al., 1984, Cell 38, 639-646; Ornitz et 
al., 1986, Cold Spring Harbor Symp. Quant. Biol. 50, 399-409; 
MacDonald, 1987, Hepatology 7, 425-515); insulin gene control 
30 region which is active in pancreatic beta cells (Hanahan, 

1985, Nature 315, 115-122), immunoglobulin gene control region 
which is active in lymphoid cells (Grosschedl et al., 1984, 
Cell 38, 647-658; Adames et al., 1985, Nature 318, 533-538; 
Alexander et al., 1987, Mol. Cell. Biol. 7, 1436-1444), mouse 

35 mammary tumor virus control region' which is active in 

testicular, breast, lymphoid and roast cells (Leder et al., 

1986, Cell 45, 485-495), albumin gene control region which is 
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active in liver (PinXert et al., 1987, Genes and Devel, i, 
268-276) , alpha-fetoprotein gene control region which is 
active in liver (Kru^ilauf et al,, 1985, Mol. Cell. Biol. 5, 
1639-1648; Hammer et al., 1987, Science 235, 53-58; alpha 1- 
5 antitrypsin gene control region which is active in the liver 
(Kelsey et al., 1987, Genes and Devel. 1, 161-171), beta- 
globin gene control region which is active in myeloid cells 
(Mogram et al., 1985, Nature 315, 338-340; Xollias et al. , 
1986, Cell 46, 89-94; myelin basic protein gene control region 

10 which is active in oligodendrocyte cells in the brain 

(Readhead et al., 1987, Cell 48, 703-712); myosin light chain- 
2 gene control region which is active in skeletal muscle 
(Sani, 1985, Nature 314, 283-286), and gonadotropic releasing 
hormone gene control region which is active in the 

15 hypothalamus (Mason et al., 1986, Science 234, 1372-1378). 

Expression vectors containing inserts of genes 
encoding polypeptides comprising a functional domain can be 
identified by three general approaches: (a) nucleic acid 
hybridization, (b) presence or absence of "marker" gene 

20 functions, and (c) expression of inserted sequences. In the 
first approach, the presence of a foreign gene inserted in an 
expression vector can be detected by nucleic acid 
hybridization using probes comprising sequences that are 
homologous to the inserted gene. In the second approach, the 

25 recombinant vector/host system can be Identified and selected 
based upon the presence or absence of certain "marker" gene 
functions (e.g., thymidine kinase activity, resistance to 
antibiotics, transformation phenotype, occlusion body 
formation in baculovirus, etc.) caused by the insertion of 

30 foreign genes in the vector. For example, if the gene 
encoding a polypeptide comprising a functional domain is 
inserted within the marker gene sequence of the vector, 
recombinants containing the gene can be identified by the 
absence of the marker gene function. In the third approach, 

35 recombinant expression vectors can* be identified by assaying 
the foreign gene product expressed by the recombinant. Such 
assays can be based, for example, on the physical or 
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functional properties of the gene product in i n vitrg assay 
systems, e.g., ability to bind to recognition units. 

Once a particular recombinant DNA molecule is 
identified and isolated, several methods knovm in the art may 
5 be used to propagate it. Once a suitable host system and 
growth conditions are established, recombinant expression 
vectors can be propagated and prepared in quantity. As 
previously explained, the expression vectors which can be used 
include, but are not limited to, the following vectors or 
10 their derivatives: human or animal viruses such as vaccinia 
virus or adenovirus; insect viruses such as baculovirus ; yeast 
vectors; bacteriophage vectors (e.g., lambda), and plasmid and 
cosinid DNA vectors, to name but a few. 

In addition, a host cell strain may be chosen which 
15 modulates the expression of the inserted sequences, or 
modifies and processes the gene product in the specific 
fashion desired. Expression from certain promoters can be 
elevated in the presence of certain inducers; thus, expression 
of the protein may be controlled. Furthermore, different host 
20 cells have characteristic and specific mechanisms for the 
translational and post-translational processing and 
modification (e.g., glycosylation, cleavage) of proteins. 
Appropriate cell lines or host systems can be chosen to ensure 
the desired modification and processing of the foreign protein 
25 expressed. For example, expression in a bacterial system can 
be used to produce an unglycosylated core protein product. 
Expression in yeast will produce a glycosylated product. 
Expression in mammalian cells can be used to ensure "native" 
glycosylation of a heterologous protein. Furthermore, 
30 different vector /host expression systems may effect processing 
reactions such as proteolytic cleavages to different extents. 

In other specific embodiments, polypeptides 
comprising a functional domain, or fragments, analogs, or 
derivatives thereof may be expressed as a fusion, or chimeric 
35 protein product (comprising the polypeptide, fragment, analog, 
or derivative joined via a peptide bond to a heterologous 
protein sequence (of a different protein) ) . Such a chimeric 
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product can be made by ligating the appropriate nucleic acid 
sequences encoding the desired amino acid sequences to each 
other by methods known in the art, in the proper reading 
frame, and expressing the chimeric product by methods commonly 
5 known in the art. Alternatively, such a chimeric product may 
be made by protein synthetic techniques, e.g., by use of a 
peptide synthesizer. 

5.8.1 Identification and Purification of the 
Xo Expressed Gene Product 

Once a recombinant which expresses the gene sequence 
encoding a polypeptide comprising a functional domain is 
identified, the gene product may be analyzed. This can be 
achieved by assays based on the physical or functional 
properties of the product, including radioactive labelling of 
the product followed by analysis by gel electrophoresis. 

Once the polypeptide comprising a functional domain 
is identified, it may be isolated and purified by standard 
methods including chromatography (e.g., ion exchange, 
2^ affinity, and sizing column chromatography), centrifugation, 
differential solubility, or by any other standard technique 
for the purification of proteins. The functional properties 
may be evaluated using any suitable assay, including, but not 
limited to, binding to a recognition unit. 

25 

5.9 Derivatives and Analogs of Polypeptides Comprising a 
Functional Domain 

The invention further provides derivatives 
(including but not limited to fragments) and analogs of 
polypeptides that are functionally active, e.g., comprising a 
functional domain. In a specific embodiment, the derivative 
or analog is functionally active, i.e., capable of exhibiting 
one or more functional activities associated with a full- 
length, wild-type polypeptide, e.g., binding to a recognition 
unit. As one example, such derivatives or analogs may have 
the antigenicity of the full-length polypeptide. 
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in particular, derivatives can be made by altering 
gene sequences encoding polypeptides comprising a functional 
domain by substitutions, additions or deletions that provide 
for functionally equivalent molecules. Due to the degeneracy 
5 of nucleotide coding sequences, other DNA sequences which 
encode substantially the same amino acid sequence as a gene 
encoding a polypeptide comprising a functional domain may be 
used in the practice of the present invention. These include 
but are not limited to nucleotide sequences comprising all or 
10 portions of such genes which are altered by the substitution 
of different codons that encode a functionally equivalent 
amino acid residue within the sequence, thus producing a 
silent change. Likewise, the derivatives of the invention 
include, but are not limited to, those containing, as a 
15 primary amino acid sequence, all or part of the amino acid 
sequence of a polypeptide comprising a functional domain 
including altered sequences in which functionally equivalent 
amino acid residues are substituted for residues within the 
sequence resulting in a silent change. For example, one or 
20 more amino acid residues within the sequence can be 

substituted by another amino acid of a similar polarity which 
acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence 
may be selected from other members of the class to which the 
25 amino acid belongs. For example, the nonpolar (hydrophobic) 
amino acids include alanine, leucine, isoleucine, valine, 
proline, phenylalanine, tryptophan and methionine. The polar 
neutral amino acids include glycine, serine, threonine, 
cysteine, tyrosine, asparagine, and glutamine. The positively 
30 charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids 
include aspartic acid and glutamic acid. 

Derivatives or analogs of genes encoding 
polypeptides comprising a functional domain include but are 
35 not limited to those polypeptides i^rhich are substantially 
homologous to the genes or fragments thereof, or whose 
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encoding nucleic acid is capable of hybridizing to a nucleic 
acid sequence of the genes. 

The derivatives and analogs of the invention can be 
produced by various methods known in the art. The 
5 manipulations which result in their production can occur at 
the gene or protein level. For example, the cloned gene 
sequence can be modified by any of numerous strategies known 
in the art (Maniatis, T., 1989, Molecular Cloning, A 
Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory, Cold 

10 Spring Harbor, New York). The sequence can be cleaved at 
appropriate sites with restriction endonuclease(s) , followed 
by further enzymatic modification if desired, isolated, and 
ligated in vitro. PCR primers can be constructed so as to 
introduce desired sequence changes during PCR amplification of 

15 a nucleic acid encoding the desired polypeptide. In the 
production of the gene encoding a derivative or analog, care 
should be taken to ensure that the modified gene remains 
within the same translational reading frame, uninterrupted by 
translational stop signals, in the gene region where the 

2 0 desired activity is encoded. 

Additionally, the sequence of the genes encoding 
polypeptides comprising a functional domain can be mutated in 
vitro or in vivo, to create and/or destroy translation, 
initiation, and/or termination sequences, or to create 

25 variations in coding regions and/or form new restriction 

endonuclease sites or destroy preexisting ones, to facilitate 
further in vitro modification. Any technique for mutagenesis 
known in the art can be used, including but not limited to, in 
vitro site-directed mutagenesis (Hutchinson, C, et al., 1978, 

30 J. Biol. Chem 253:6551), use of TAB« linkers (Pharmacia), etc. 

Manipulations of the sequence may also be made at 
the protein level. Included within the scope of the invention 
are protein fragments or other derivatives or analogs which 
are differentially modified during or after translation, e.g., 

35 by glycosylation, acetylation, phosphorylation, amidation, 
derivatization by known protecting/blocking groups, 
proteolytic cleavage, linkage to an antibody molecule or other 
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cellular ligand, etc. Any of nuverous chemical modifications 
may be carried out by known techniques , including but not 
limited to specific chemical cleavage by cyanogen bromide, 
trypsin, chymotrypsin, papain, V8 protease, NaBH4; 
5 acetylation, formylation, oxidation, reduction; metabolic 
synthesis in the presence of tunicamycin; etc. 

In addition, analogs and derivatives can be 
chemically synthesized. For example, a peptide corresponding 
to a portion of a polypeptide comprising a functional domain 

10 can be synthesized by use of a peptide synthesizer. 

Furthermore, if desired, nonclassical amino acids or chemical 
amino acid analogs can be introduced as a substitution or 
addition into the sequence. Non-classical amino acids include 
but are not limited to the D-isomers of the common amino 

15 acids, a-amino isobutyric acid, 4-aminobutyric acid, 
hydroxyproline, sarcosine, citrulline, cysteic acid, t- 
butylglycine, t-butylalanine, phenylglycine, 

cyclohexylalanine, /3-alanine, designer amino acids such as ^- 
methyl amino acids, Ca-methyl amino acids, and Na-methyl amino 
20 acids. 

5.10 Antibodies to Polypeptides Comprising 

a Functional Domain 

According to one embodiment, the invention provides 
antibodies and fragments thereof containing the binding domain 
thereof, directed against polypeptides comprising a functional 
domain. Accordingly, polypeptides comprising a functional 
domain, fragments or analogs or derivatives thereof, in 
particular, may be used as immunogens to generate antibodies 
^ against such polypeptides, fragments or analogs or 

derivatives. Such antibodies can be polyclonal, monoclonal, 
chimeric, single chain. Fab fragments, or from an Fab 
expression library. In a specific embodiment, antibodies 
specific to the functional domain of a polypeptide comprising 
j5 a functional domain may be prepared- 

Various procedures known in the art may be used for 
the production of polyclonal antibodies. In a particular 
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embodiment, rabbit polyclonal antibodies to an epitope of a 
polypeptide comprising a functional domain, or a subsequence 
thereof, can be obtained. For the production of antibody, 
various host animals can be immunized by injection with the 
5 native polypeptide comprising a functional domain, or a 
synthetic version, or fragment thereof, including but not 
limited to rabbits, mice, rats, etc. Various adjuvants may be 
used to increase the immunological response, depending on the 
host species, and including but not limited to Freund^s 

10 (complete and incomplete) , mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, 
pluronic polyols, polyanions, peptides, oil emulsions, keyhold 
limpet hemocyanins, dinitrophenol, and potentially useful 
human adjuvants such as BCG (bacille Calmette*-Guerin) and 

15 corynebacterium parvum. 

For preparation of monoclonal antibodies, any 
technique which provides for the production of antibody 
molecules by continuous cell lines in culture may be used. 
For example, the hybridoma technique originally developed by 

20 Kohler and Milstein (1975, Nature 256, 495-497), as well as 
the trioma technique, the human B-cell hybridoma technique 
(Kozbor et al., 1983, Immunology Today 4, 72), and the EBV- 
hybridoma technique to produce human monoclonal antibodies 
(Cole et al., 1985, in Monoclonal Antibodies and Cancer 

25 Therapy, Alan R- Liss, Inc., pp. 77-96). 

Antibody fragments which contain the idiotype 
(binding domain) of the molecule can be generated by known 
techniques. For example, such fragments include but are not 
limited to: the F(ab')2 fragment which can be produced by 

30 pepsin digestion of the antibody molecule; the Fab' fragments 
which can be generated by reducing the disulfide bridges of 
the F(abM2 fragment, and the Fab fragments which can be 
generated by treating the antibody molecule with papain and a 
reducing agent. 

35 In the production of antibodies, screening for the 

desired antibody can be accomplished by techniques known in 
the art, e.g. ELISA (enzyme-linked immunosorbent assay). 
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6. FXAMPLES 



6.1. Identification of Genes from cDNA Expression 

j,i ^rar ies 1 

A Study was initiated to determine whether peptide 
' recognition units could recognize functional domains that are 
the same as or similar to their target functional domain but 
that are contained in proteins other than the protein 
containing their target functional domain, such "functional- 
screens, using recognition units of relatively small size, 
were not previously known and were difficult to develop 
because of the low degree of sequence homology among 
functional domain-containing proteins. Thus, for example, an 
oligonucleotide probe could not be designed with any degree of 
^5 confidence based on the low degree of homology of primary 
sequences of SH3 domains. 

using SH3 domain-binding peptides from combinatorial 
peptide libraries as recognition units, we screened a series 
of mouse and human cDNA expression libraries. We found that 
69 of the 74 clones isolated from the libraries encoded at 
*^ least one SH3 domain. These clones represent more than 18 
different SH3 domain-containing proteins, of which more than 
10 have not been described previously. 

The initial recognition unit chosen was a Src SH3 
domain-binding peptide (termed pSrcCII) isolated from a phage- 
" displayed random peptide library (Sparks et al., 1994, J. 
Biol. Chem. 269:23853-23856). pSrcCll was (biotin- 
SGSGGILAPPVPPRHTR-NH,) (SEQ ID N0:1). pSrcCII was synthesized 
by standard FMOC chemistry, purified by HPLC, and its 
structure was confirmed by mass spectrometry and amino acid 
analysis. To form multivalent complexes, 50 pmol biotinylated 
psrccil peptide was incubated with 2 streptavidin-alkaline 
phosphatase (SA-AP) (for a biotinsbiot in-binding site ratio of 
l:l). Excess biotin-binding sites were blocked by addition of 
500 pmol biotin. Alternatively, 31.2 Ml of 1 mg/ml SA-AP 
" could have been incubated with 15 Ml of 0.1 mM biotinylated 
peptide for 30 min at 4 -C. Ten m1 of 0.1 mM biotin would 
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then be added, and the solution incubated for an additional 15 
min. 

A XEXlox aouse 16 day embryo cDNA expression library 
was obtained from Novagen (Madison, WI) . The cDNA library was 
5 screened according to published protocols (Young and Davis, 
1983, Proc. Natl. Acad. Sci. USA 80:1194-1198). The library 
was plated at an initial density of 30,000 plagues/lOO mm 
petri plate as follows. A library aliquot was diluted 1:1000 
in SM (100 mM NaCl, 8 mM MgSO^, 50 inM Tris HCl pH 7,5, 0.01% 

10 gelatin). Three /xl of diluted phage were added to 1.5 ml each 
of SM, 10 mM CaClj/MgClj, and an overnight culture of 
BL21(DE3)pLysE £. coii cells. BL21 overnight cultures were 
grown in 2xYT medium (1.6% tryptone, 1% yeast extract, and 
0.5% NaCl) supplemented with 10 mM MgS04, 0.2 % maltose, and 

IS 25Mg/ml chloramphenicol. This mixture was incubated 20 min at 
37<'C, after which 300 /zl were plated on each of 14 2xyT agar 
plates in 3 ml 0.8% 2xYT top agarose containing 25 Mg/ml 
chloramphenicol. Plaques were allowed to form for 6 hours at 
37 *C, after which isopropyl-jJ-D-thiogalactopyranoside (IPTG)- 

20 soaked filters were applied. After an additional eight hours' 
incubation at 37 the filters were marked, removed from the 
plates, and washed three times with phosphate buffered saline 
(PBS; 137 mM NaCl, 2.7 mM KCl, 4.3 mM Na2HP04, 1.4 mM KH2PO4) , 
0.1% Triton X-100. The filters were blocked for 1 hour in 

25 PBS, 2% bovine serum albumin (blocking solution) and 

subsequently incubated overnight at 4«C with fresh blocking 
solution plus streptavidin-alkaline phosphatase (SA-AP) 
complexed peptide. Approximately 1 SA-AP complexed with 
peptide in 1 ml blocking solution was used for each filter. 

30 The filters were then subjected to four 15 minute washes with 
PBS, 0.1% Triton X-100. Bound SA-AP-peptide complexes were 
detected by incubation with 44 ml nitroblue tetrazolium 
chloride (NBT, 75 mg/ml in 70% dimethylf ormamide) and 33 ml of 
5-bromo-4-chloro-3-indoyl-phosphate-p-toluidine salt (BCIP 50 

35 mg/ml in dimethylformamide) in 10 inl of alkaline phosphatase 
buffer (0.1 M Tris-HCl, pH 9.4, 0.1 M NaCl, 50 mM MgCl,) ; the 
signals were robust, often evident within a few minutes. 
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Positive plaques were cored with a Pasteur pipet and placed in 
1 ml SM with a drop of chloroform. Lambda phage -particles are 
structurally resistant to chloroform, which serves as a 
bacteriocidal agent. These cores were allowed to diffuse into 
S solution for at least 1 hr before subsequent platings. Phage 
from cores were plated in 100 (xl each of SM, 10 mM CaCl,/MgCl„ 
and an overnight culture of BL21 (DE3) pLySE cells. Phage 
were plated with the intention of reducing the number of 
plaque forming units (pfu) /plate by roughly a factor of 10 
10 with each screen (I.e., 3 x 10* in the primary screen, 3 X 10» 
in the secondary, and so on) . This was accomplished by 
diluting cores 1:1000 and plating I-IO nl/pl&t^. Four screens 
were generally required to obtain isolated plaques. 

Plasmids were rescued from the XEXlox phage by cre- 
15 mediated excision in BM25.8 E. coli cells. For each clone, 5 
til of a 1:100 dilution of phage were added to a solution 
containing 100 Ml SM and 100 ^1 of an overnight culture of 
BM25.8 cells (grown in 2xYT media supplemented with 10 mM 
MgSO., 0.2 % maltose, 34 ng/vxl chloramphenicol, and 50 ng/nl 
20 kanamycin) • After 30 minutes at 37 100 Ml of this 

solution were spread on an LB amp agarose plate and incubated 
overnight at 37 -C. A single colony from each plate was used 
to inoculate 3 ml of 2xyT/amp and incubated overnight. 
Plasmid DNA was purified from the overnight culture using 
25 Proroega Wizard Miniprep DNA purification kits (Proroega, 
Madison, wi), extracted with an equal volume of 
phenol /chloroform followed by chloroform alone, and ethanol 
precipitated. This plasmid DNA was used to transform 
chemical-competent DH5o cells. Three colonies from each 
30 transformation were used to inoculate 3 ml cultures; DNA was 
purified as described above. Approximately, 1/20 of each 
individually purified DNA sample from transformed cells was 
digested with EcoRl and Hindlll and examined by 
electrophoresis on a 1% agarose gel to determine insert size 
35 and DNA quality. One DNA prep for each clone was either 

sequenced manually using the dideoxy method or by an automated 
technique that uses fluorescent dideoxynucleotide terminators. 
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The T7 gene 10 primer located approximately 4 0 bp upstream of 
the EcoRl restriction site was used conveniently in both 
cases. 

Approximately 100 of 1X10* plaques in the primary 
5 screen of the \EXlox 16 day mouse exabryo cDNA expression 
library exhibited significant pSrcCII-binding activity. 
Figure 5 is representative of filters from primary and 
tertiary screens. Of the eighteen positive clones that were 
isolated and sequenced, all were found to encode proteins with 

10 SH3 domains, although several clones appeared to be siblings 
or to originate from the same mRNA. Thus, the pSrcCII screen 
resulted in the identification of cDNAs encoding nine distinct 
SH3 domain-containing proteins (see Figure 9) . The sequences 
of these proteins were compared to the sequences in GenBank 

15 with the computer program BLAST. Three of these proteins 
corresponded to entries in GenBank. SH3P1 appears to be the 
murine homologue of p53bp2, a p53-binding protein, p53bp2 
(Iwabuchi et al., 1994, Proc. Natl. Acad. Sci. USA 91:6098- 
6102); SH3P6 resembles human MLN50, a gene amplified in some 

20 breast carcinomas (Tomasetto et al, , 1995, Genomics 28:367- 
376); and SH3P5 is Cortactin, a protein implicated in 
cytoskeletal organization (Wu and Parsons, 1993, J. Cell Biol. 
120:1417-1426). Six of the clones did not match entries in 
GenBank, indicating that the present invention can be used to 

25 identify novel SH3 domain-containing proteins- Of these novel 
proteins, SH3P2 contains three ankyrin repeats and a proline- 
rich region flanking its SH3 domain; SH3P7 and SH3P9 contain 
sequences related to regions in the proteins drebrin (Ishikawa 
et al., 1994, J. Biol. Chem. 269:29928-29933) and amphiphysin 

30 (David et al., 1994, FEBS Lett. 351:73-79), respectively. 
Finally, the novel proteins SH3P4 and SH3P8, although not 
similar to any known proteins, are highly related (89% amino 
acid similarity) to one another. 

The present invention can be used as part of an 

35 iterative process in which a recognition unit is used to 

identify proteins containing functional domains which are, in 
turn, used to derive additional recognition units for 
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subsequent screens. For example, to define the binding 
specificity of these newly cloned SH3 domains, they can be 
overexpressed as glutathione S-transf erase (GST) -fusion 
proteins in bacteria, which, in turn, can be used to screen a 
5 random, peptide library in order to obtain recognition units 
which, in turn, can be used to screen cDNA libraries in order 
to obtain still more novel proteins containing SH3 domains. 

The recognition unit binding preferences of two of 
the SH3 domains isolated in the pSrcCII screen described above 
10 (p53bp2 and Cortactin) have been described (SparXs et al., 
1996, Proc. Natl. Acad. Sci. USA 93:1540-1544. Each of these 
SH3 domains recognizes recognition unit motifs related to, yet 
distinct from, the pSrcCII sequence. We used a synthetic 
peptide (pcort) containing the Cortactin SH3 recognition unit 
15 motif to screen the mouse embryo cDNA expression library. 

pcort was (biotin-SGSGSRLTPQSKPPLPPKPSWVSR-NH,) (SEQ ID NO: 2). 
pCort was prepared and complexed with SA-AP as above for 
psrccil. screening of the mouse embryo library with pCort was 
done as above for pSrcCII. 
20 Twenty six clones, of varying signal strength, were 

isolated and twenty-one were found to encode SH3 domain 
containing proteins. The pCort screen yielded genes 
corresponding to nine distinct SH3 domain-containing proteins 
(see Figure 9), four of which corresponded to entries in 
25 GenBank. SH3P5 and SH3P6 are Cortactin and MLN50, discussed 
above; SH3P10 matched SPY75/HS1, a protein involved in IgE 
signaling (Fukamachi et al., 1994, J. Immunol. 152:642-652); 
and SH3P11 is Crk, an SH2 domain and SH3 domain-containing 
adaptor molecule (Knudsen et al., 1994, J. Biol. Chem. 
30 269:32781-32787). The five novel transcripts encode SH3P7, 
SH3P8, and SH3P9, discussed above; SH3P13, an additional 
member of the SH3P4/SH3P8 family; and SH3P12, a protein with 
three SH3 domains and a region sharing significant sequence 
similarity with the peptide hormone sorbin (Vagen-Descroiz M. 
35 et al., 1991, Eur. J. Biochero. 201J53-50) . 

interestingly, the output from the pCort screen only 
partially overlapped with that of the pSrcCII screen: four of 
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the nine SH3 -containing proteins isolated with pCort were not 
identified with pSrcCII. In addition, SH3P9, the protein 
identified most frequently (50%) in the pSrcCII screen was 
isolated at a much lower frequency (7%) with the pCort probe. 
5 Thus, different recognition units can be used to identify 
distinct sets of SH3 domains. 

In addition to possessing at least one SH3 domain, a 
prominent characteristic of the proteins identified in the 
pSrcCII and pCort screens is the position of the SH3 domain 

10 within the proteins: twelve of thirteen proteins possess SH3 
domains near their C-termini. Although pSrcCII binds well to 
the Src SH3 domain (Figure 8) , Src (whose SH3 domain occurs 
near the N-terminus) was not identified in the pSrcCII screen. 
We suspect the bias was a consequence of the fact that the 

15 mouse embryo cDNA library was constructed using oligo-dT- 
primed cDNA. Alternatively, it may be that the mBNA used to 
prepare the library contained very little, or no, Src 
transcripts. 

A variant of the pSrcCII peptide (T12SRC.1) was used 

20 to probe a Xgt22a human prostate cancer cell line cDNA library 
primed with oligo-dT and a Xgtll human bone marrow library 
primed with random and oligo-dT primers. T12SRC.1 was 
(biotin-GILAPPVPPRNTR-NHj) (SEQ ID N0:3). T12SRC.1 was used 
in the initial screens together with the peptide T12SRC.4. 

25 T12SRC.4 was (biotin-VLKRPLPlPPVTR-NH,) (SEQ ID NO: 4). The 
Xgt22a human prostate cancer cell line cDNA library was made 
from the LNCaP prostate cancer cell line by using standard 
methods, i.e., the Superscript Lambda system for cDNA 
synthesis and cloning (Bethesda Research Laboratories, 

30 Caithersburg , MD) . The Xgtll human bone marrow cDNA 

expression library was obtained from Clonetch (Palo Alto, CA) . 
The human libraries were screened and positive clones isolated 
as described above for the mouse 16 day embryo cDNA library, 
except that cDNA inserts of the Xgtll and Xgt22a phage were 

35 amplified by PCR rather than being rescued by cre-mediated 
excision. Of the 1.2X10' XcDNA clones screened from these 
libraries, 30 exhibited detectable pSrcCII-binding activity. 
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Analysis of the positive clones revealed that they each 
encoded at least one SH3 domain, and that they originated from 
a total of six different transcripts (Figure 9) . Three of 
these encode proteins possessing non-C- terminal SH3 doMins, 
5 indicating that the present invention can be used to identify 
active domains regardless of their position within a protein. 
Of the six proteins identified, only three matched GenBank 
entries. SH3P15 and SH3P16 are Fyn (Kawakami et al., 1988, 
Proc. Natl. Acad. Sci. USA 85:3870-3874 and Lyn (Yamanashi et 
10 al., 1987, Mol. Cell. Biol. 7:237-243), respectively, two Src- 
family members possessing SH3 domains with ligand preferences 
similar to that of the Src SH3 domain {Rickles, 1994, EMBO J. 
13:5598-5604); and SH3P14 appears to be the human homologue of 
murine H74, a protein of unknown function. The three 
15 remaining proteins did not match entries in GenBank and 
include the human homolog of SH3P9, described above, and 
SH3P17 and SH3P18, fragments of two related (85% amino acid 
similarity) adaptor-like proteins comprised of at least four 
and three SH3 domains, respectively. 
20 Examination of the primary sequences of the SH3 

domains identified in this work reveals several interesting 
features (see Figure 10). Positions important for ligand 
binding by the Src SH3 domain (Feng et al., 1994, Science 
266:1241-1247; Lescure et al., 1992, J. Mol. Biol. 228:387-94) 
25 and essential for SH3 function in Grb2/Sem5 are conserved 
(Clark et al., 1992, Nature 356:340-344). In addition, the 
two gaps in the sequence alignment shown in Figure 10 
correspond to regions of length variation observed among 
previously characterized SH3 domains. Surprisingly, the SH3 
30 domains identified in this work are not significantly more 
similar to one another than they are to other known SH3 
domains, with the exception of the mouse and human forms of 
SH3P9 and SH3P14 which are 100% and 83% identical, 
respectively. This result indicates that SH3 domains can vary 
35 widely in primary structure and still bind proline-rich 
peptide recognition units selectively. 
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6.1.1. Nucleotide and Corresponding Amino Acid 
Sequences of Genes Identified from cDNA 

Expr?s6ipn Li^rftri^s — — 

The nucleotide sequences of SH3P1, SH3P2 , SH3P3, 
SH3P4, SH3P5, SH3P6, SH3P7 , SH3P8 , SH3P9, SH3P10, SH3P11, 
SH3P12, SH3P13, and SH3P14 , the mouse genes identified by 
screening the 16 day mouse embryo cDNA expression library with 
the peptides pSrcII and pCort, are shovm in Figures 18, 20, 
22, 24, 26, 28, 30, 32, 34, 38, 40, 42A and B, 44, and 46A and 
B, respectively. The corresponding amino acid sequences of 
the mouse genes SH3P1, SH3P2, SH3P3, SH3P4, SH3P5, SH3P6, 
SH3P7, SH3P8, SH3P9 , SH3P10, SH3P11, SH3P12, SH3P13, and 
SH3P14 are shown in Figures 19, 21, 23, 25, 27, 29, 31, 33, 
35, 39, 41, 43, 45, and 47, respectively. 

The nucleotide sequences of SH3P9, SH3P14, SH3P17, 
and SH3P18, human genes identified by screening the human bone 
marrow and human prostate cancer cDNA expression libraries 
with the peptide T12SRC.1, are shown in Figures 36, 48, 50, 
and 52, respectively. The corresponding amino acid sequences 
of the human genes SH3P9, SH3P14, SH3P17, and SH3P18 are shown 
in Figures 37, 49, 51, and 53, respectively. 

Two genes, SH3P9 and SH3P14, were isolated from both 
mouse and human libraries. 

The sequences of SH3P15 and SH3P16 are not shown. 
SH3P15 is Lyn and SH3P16 is Fyn. 

Figure 54 shows the nucleotide sequence of clone 55, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 
recognition units a mixture of T12SRC.4 and pCort (described 
in Section 6.1) and the methods described in Section 6.1. 

Figure 55 shows the amino acid sequence of clone 55. 

Figure 56 shows the nucleotide sequence of clone 56, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 
recognition units a mixture of T12SRC.4 and pCort (described 
in Section 6.1) and the methods described in Section 6.1. 

Figure 57 shows the amino acid sequence of clone 56. 
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Figure 58A shows the nucleotide sequence from 
position 1-1720 and Figure 58B shows the nucleotide sequence 
from position 1720-2873 of clone 65, a novel human gene 
identified and isolated from a human bone marrow cDNA library 
5 (described in Section 6.1) using as recognition units a 

mixture of P53BP2.Con and Nckl.ConS and the methods described 
in Section 6.1. P53BP2.Con and Nckl.Con3 are peptides, the 
amino acid sequences of which are biotin-SFAAPARPPVPPRKSRPGG- 
NHj (SEQ ID NO: 201) and biotin-SFSFPPLPPAPGG-NHj (SEQ ID 
10 NO: 202), respectively. The sequences of P53BP2.Con and 
Nckl.ConS are consensus sequences of recognition units that 
bind to the SH3 domains of p53bp2 and Nek, respectively. 

Figure 59 shows the amino acid sequence of clone 65. 
Figure 60 shows the nucleotide sequence of clone 34, 
15 a novel human gene identified and isolated from a human 

prostate cancer cDNA library (described in Section 6.1) using 
as recognition units a mixture of T12SRC.1 and T12SRC.4 
(described in Section 6.1) and the methods described in 
Section 6.1. 

20 Figures 61A and 61B show the amino acid sequence of 

clone 34. 

Figure 62 shows the nucleotide sequence of clone 41, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 

25 recognition units a mixture of PXXP.NCK.Sl/4 and 

PXXP.ABL.G1/2M and the methods described in Section 6.1. 
PXXP.NCK.Sl/4 and PXXP, ABL.G1/2M are peptides, the amino acid 
sequences of which are biotin-SRSLSEVSPKPPIRSVSLSR-NH, (SEQ ID 
NO: 222) and biotin-SRPPRWSPPPVPLPTSLDSR-NH, (SEQ ID NO: 223), 

30 respectively. PXXP.NCK.Sl/4 and PXXP.ABL.G1/2M bind to the 
SH3 domains of Nek and Abl, respectively 

Figures 63A and 63B show the amino acid sequence of 

clone 41. 

Figure 64 shows the nucleotide sequence of clone 53, 
3S a novel human gene identified and isolated from a human 

prostate cancer cDNA library (described in Section 6.1) using 
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10 



as recognition units a mixture of PXXP.NCX.Sl/4 and 
PXXP.ABL.61/2N and the methods described in Section 6.1. 

Figures 65A and 65B show the amino acid sequence of 

clone 53. 

Figures 66A and 66B show the nucleotide and amino 
acid sequence of clone 5, a novel human gene identified and 
isolated from a HELA cell cDNA library using as recognition 
units a mixture of T12SRC.1 and T12SRC.4 (described in Section 
6.1) and the methods described in Section 6.1. 

6.2. Use of Peptides Resembling SH3 Domain Binding 

Sequences as Recognition Units 



We inspected a number of published amino acid 
sequences and identified proline-rich stretches of amino acids 
that resembled consensus SH3 domain binding sec[uences« 
Peptides comprising these proline*rich sequences were 
synthesized and tested by the methods of the present invention 
for their ability to specifically bind to the novel SH3 
domains described in Sections 6.1 and 6. 1.1. Purified SH3 
domain-containing clones were spotted on a lawn of ¥1090 host 
cells, grown for an appropriate amount of time, and plaque 
filter lifts were screened with biotinylated peptides 
complexed with streptavidin-alkaline phosphatase as described 
in Section 6.1. 

25 The results are shown in Figures 12 and 13. As can 

be seen, in many cases the synthesized peptides were able to 
bind to the novel SH3 domains. This indicates that those 
synthesized peptides could have been used to identify those 
novel SH3 domains from sources of polypeptides. 



30 



6.3. Valency of Peptide Recognition Units Affects 
Specificitv of Recognition Units 



6.3.1 Preconjugation of Peptide Recognition Units 
with Streptavidin-Alkaline Phosphatase 
Increases Affinity of the Recognition Units for 
35 Targets ^ 

As a preliminary test of the effect of the valency 
of peptide recognition units on the ability of those 
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recognition units to be used as probes to detect SH3 domains, 
biotinylated peptides that had been previously shown to bind 
the SH3 domains of either Src or Abl were tested for their 
ability to bind their respective SH3 domain when either 
5 preconjugated with streptavidin-alkaline phosphatase (SA-AP) 
or not so preconjugated. GST-SrcSH3 and GST-AblSH3 fusion 
proteins (produced as described in Sparks et al., 1994, J. 
Biol. Chem. 269:23853-23856) were resolved by 10% SDS-PAGE and 
transferred to an Imnobilon D nylon membranes (Millipore, New 
10 Bedford, MA). The membranes were incubated in blocking 

solution for 1 hr at 25 "C and then incubated overnight at 4 
»C with either biotinylated Src SH3 domain or biotinylated Abl 
SH3 domain binding peptides in either multivalent (SA-AP) or 
monovalent format. The filters were washed three times (15 
15 min each wash) in PBS/T and incubated with NBT and BCIP for 
color development. See Section 6.1 for further details of the 
detection process. 

The results are shown in Figure 14. In panels A, 
the biotinylated peptides were preconjugated with SA-AP and 
20 then allowed to bind to the immobilized SH3 domains. 

Preconjugation was as described in Section 6.1. In panels B, 
the peptides were first allowed to bind to the immobilized SH3 
domains and then the bound peptides were detected by adding 
SA-AP. In both cases, color development was as in Section 
25 6.1. The sequences of the peptides used were: Biotin- 

SGSGGILAPPVPPRNTR (SEQ ID N0:1) for the Src specific peptide 
and Biotin-SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41) for the Abl 
specific peptide. The results shown in Figure 14 demonstrate 
that preconjugation with SA-AP dramatically increases the 
30 strength of the signal detected. 

6 3 2 Preconjugation of Peptide Recognition Units 
* * * with streptavidin-Alkaline Phosphatase Results 
i^ ^ Reeoanj tion of a Variety of SH3 Dorogins 

TWO Mg of each of a panel of GST-SH3 domain fusion 
proteins were transferred to Immobilon D nylon membranes 
(Millipore, New Bedford, MA) using a dot-blot apparatus. 
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Biotinylated Src, Abl, or Cortactin SH3 domain-binding 
peptides were preconjugated to SA-AP and incubated with the 
filter; an alkline-phophatase driven color reaction was used 
to detect peptide binding. The panel of immobilized proteins 
5 was also reacted with a polyclonal anti-GST antibody 

(Pharmacia, Piscatavay, NJ) • Sequences of the Src, Abl, and 
Cort act in-binding peptides were Biotin-SGSGVLXRPLPIPPVTR (S£Q 
ID NO: 42), Biotin-SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41), and 
Biotin-SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO: 43) , respectively. 

10 As can be seen from the results shown in Figure 15, 

the preconjugated biotinylated peptides recognized not only 
their original target SH3 domains, but related domains as 
well. The Src peptide recognized the SH3 domains of Yes and 
Cortactin as well as the SH3 domain of Src; the Abl peptide 

15 recognized the Cortactin SH3 domain as well as the Abl SH3 
domain; and the Cortactin peptide recognized Src, Yes, Abl, 
Crk, and the C terminal Grb2 SH3 domains as well as 
recognizing the Cortactin SH3 domain. 

The above experiment was performed utilizing SH3 

20 domains that had been immobilized on nylon membranes. The 
following demonstrates that preconjugation with streptavidin 
also permits peptide recognition units to recognize a variety 
of SH3 domains when those domains are immobilized in the wells 
of a microtiter plate. 

25 Five different peptide recognition units (pAbl, 

pPLC, pCrk, pSrcCI, pSrcCII) were tested in either multivalent 
or monovalent format for their ability to bind to seven 
different SH3 domains (Src, Abl, PLC7, Crk, Cortactin, Grb2N, 
Grb2C) in an ELISA. The sequences of these peptides were as 

30 follows: pAbl, SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41); pPLC, 
SGSGSMPPPVPPRPPGTLGG (SEQ ID NO: 66); pCrk, 
SGSGNYVNALPPGPPLPAKNGG (SEQ ID NO: 67); pSrcCI, 
SGSGVLKRPLPIPPVTR (SEQ ID NO:42); pSrcCII, SGSGGILAPPVPPRNTR 
(SEQ ID N0:1). These peptides were biotinylated as in Section 

35 6.1. 

The SH3 domains were produced as GST-SH3 fusion 
proteins as described in Sparks et al., 1994, J. Biol. Chem. 
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269:23853-23856. Their purity and concentration were 
confirmed by SDS-PAGE and Bradford protein assays, 
respectively. The GST-SH3 fusion proteins were immobilized in 
the wells of microtiter plates as follows: Two micrograms of 
5 each GST-SH3 fusion protein were incubated in wells of a flat 
bottom enzyme linked immunoabsorbent assay (ELISA) microtiter 
plate (Costar, Cambridge, MA) in 100 mH NaHCOj for Ihr 25 'C. 
One volume of SuperBlock blocking buffer (Pierce Chemical Co., 
Rockford, IL) was added to each well and incubated for an 
10 additional 30 min. Plates were washed three times with 
PBS/0.1% Tween-20/0.1% bovine serum albumin (BSA) . 
Immobilized proteins were detected with SH3 domain-binding 
peptides in multivalent or monovalent formats using 
streptavidin-horseradish peroxidase (SA-HRP; Sigma Chemical 
15 Co., St. Louis, MO). For complexation of the biotinylated 
peptides and SA-KRP, peptide and SA-HRP concentrations were as 
described for SA-AP complexation in Section 6.1, but all 
incubations and washes were in PBS/0,1% Tween-20/0.1% BSA. 
Plates were washed five times before colorimetric reaction and 
20 before the addition of SA-HRP (monovalent format) . The amount 
of bound SA-HRP was evaluated with the addition of 100 ^1 
horseradish peroxidase substrate [2 ' ,2 '-Azino-Bis 3- 
Ethylben2thiazoline-6-Sulfonic Acid (ABTS) , 0.05 % hydrogen 
peroxide, 50 mM sodium citrate, pH 5.0]. After 5-30 minutes 
25 of reaction time, the optical densities (OD) of the microtiter 
plate wells were measured with a microtiter plate scanner 
(Molecular Devices, Sunnyvale, CA) set for 405 nm wavelength. 
The results are shown in Figure 8. From Figure 8 it can be 
seen that the tetravalent (complexed) peptides display both 
30 increased affinity and broadened specificity toward SH3 

targets. Binding of complexed peptides was, however, still 
restricted to SH3 domains; the complexes bind to neither GST 
(Figure 8) nor other unrelated proteins (data not shown). 
Thus, precomplexation with SA-AP decreases the specificity of 
35 the peptide recognition units but does not make the peptides 
non-specific. Rather, the peptides, when precomplexed, 

- 88 - 



wo 96/31625 



PCT/US96/04454 



recognize a variety of SH3 domains in addition to their target 
domains, 

6.3.3. Preconjugation of Peptide Recognition Units 

with Streptavidin-Alkaline Phosphatase Results 
* in Recognition of a Variety of Expressed cDNA 

Clpnes 

Lambda phage clones of genes containing a variety of 
SH3 domains were isolated from screens of a 16 day mouse 
embryo cDNA expression library (Novagen, Madison, WI) . For a 
description of the isolation of these cDNA clones, see Section 
6.1, Phage particles corresponding to individual lambda phage 
cDNA recombinants were spotted onto 2xYT-1.5 % agar petri 
plates onto which had been poured 3 ml of 2XYT-0.8 % agarose 
with 100 Ml of a BL21(DE3)pLysE T. coli culture grown 
overnight. After a 6 hr incubation at 37 'C, expression of 
the cDNA segments was induced with IPTG-soaked nitrocellulose 
filters. After overnight incubation, the expressed proteins 
had been transferred to the filters and the filters were then 
incubated with either biotinylated SH3 -domain binding peptides 
preconjugated to SA-AP or a monoclonal antibody recognizing 
the T7-Tag fusion peptide (aT7.10Mab; Novagen, Madison, WI) . 
This antibody was used as a positive control since it 
recognized an epitope expressed by all the clones (part of the 
^10 leader sequence common to all XEXiox recombinants) . 
Sequences of pSrcI, pSrcll, Cortactin, and CaM (Calmodulin 
binding) peptides were Biotin-SGSGVLKRPLPIPPVTR (SEQ ID 
NO:42), Biotin-SGSGGILAPPVPPRNTR (SEQ ID N0:1), Biotin- 
SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO:43), and Biotin- 
STVPRWIEDSLRGGAARAQTRLASAK (SEQ ID NO:44), respectively- 
30 Tj^g results are shown in Figure 16. From Figure 16 

it can be seen that precomplexation with SA-AP decreases the 
specificity of the peptide recognition units but does not make 
the peptides non-specific; none of the peptides react in a 
significant fashion with two negative control sequences, o- 
actinin and calmodulin (CaM). Rather, the peptides, when 
precomplexed, recognize a variety of SH3 domain-containing 
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cDNA Clones in addition to clones containing their target 
domains. 

6.4. rh^raeteri z f>l-<ftn of cPNR clone-enrp(lP<1 prPtg l nS 

5 6.4.1. pr»Hnei.ion of rPWA clonp-ffirffflpd prfftg i flg 

Purified DNA from all positive cDNA clones (ca. 18- 
20 positive clones per recognition unit) was used to transform 
chemical-competent BL21 cells (Hanahan et al., 1983, J. Mol. 
Biol. 166:557-580, the complete disclosure of which is 
10 incorporated by reference herein) . 

Colonies that appeared after growth overnight at 
37 "C on 2XYT agar plates containing 100 ng/ml ampicillin were 
used to inoculate 4 ml cultures of 2xYT/amp. After 7 hours of 
incubation at 37 -C with shaking. IPTG was added to each 
15 culture to a final concentration of 100 fM. After an 

additional 2 hours of incubation, 1 ml of each culture was 
collected and centrifuged to pellet the cells. Cell pellets 
were resuspended in 400 Ml ix SDS/DTT loading buffer and 
boiled at 100 'C for 5 min. The resulting cell lysates were 
20 subjected to Sodium Dodecyl Sulf ate-Polyacrylamide Gel 

Electrophoresis (SDS-PAGE) on an 8% acrylamide gel. Gels were 
either Coomassie stained or transferred to Immobilon D 
membrane (Millipore) and blotted (Towbin et al., 1979. Proc. 
Natl. Acad. Sci. 76:4350-4354). 

25 

6.5. Materials Used in Sections 6.1, 6.2, 6.3.1. 6.3.2, 

ffii^i? ""'^ 6.4.1 



BlocXi&g solution 

Hepes (pH 8) 20 mM 
" W9C1, 5 mM 

KCl ^ 
Dithiothreitol 5 mM 

Milk Powder 5% w/v 



2XYT media (IL) 

Bacto tryptone 16 g 

Yeast Extract 10 g 

^5 NaCl 5 g 

2XYT agar plates 
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2XYT + 15 g agar/L 

2xYT top agarose (8%) 
2xYT -f 8 g agarose/L 

8D8/DTT loading buffar 

(10 mL of 5x solution) 

•5 M Trie base 0.61 g 

8.5% SDS 0.85 g 

27.5% sucrose 2.75 g 

100 xnH DTT 0.154 g 

•03% Bronophenol Blue 3.0 mg 



Overnight cell cultures: 
Inoculate media with one isolated 
colony of appropriate cell type and 
incubate 37 «C 0/N with shaking 

BL21 (DE3} pLysE 
2xYT media 
maltose 
15 MgS04 

Chloramphenicol 



0.2% 
10 mM 
25 ^g/mL 



BK25.8 

2xYT media 

maltose 0.2% 

NgSO^ 10 mM 

Chloramphenicol 34 ^g/nl 

Kanamycin 50 Mg/al 



6.6. other Functional Domains and Recognition Units 

In a manner similar to that described above for SH3 
domains, recognition units directed to other functional 
domains of interest can be chosen for use in the present 
method. For example, as recognition units for a study of GST 
functional domains, the following GST-binding peptides can be 
used to screen a plurality of polypeptides: Class I CWSEWDGNEC 
(SEQ ID N0:46), CGQWADDGYC (SEQ ID N0:47) , CEOWDGYGAC (SEQ ID 
NO:48), CWPFWDGSTC (SEQ ID NO:49), CMIWPDGEEC (SEQ ID N0:50), 
CESOWDGYDC (SEQ ID NO: 51), CQQWKEDGWC (SEQ ID NO: 52), or 
CLYOWDGYEC (SEQ ID NO: 53); Class II - CMGDNLGDDC (SEQ ID 
NO:54), CMGDSLGOSC (SEQ ID NO:55), CMDDDLGKGC (SEQ ID N0:56) , 
CMGENLGWSC (SEQ ID N0:57), or CLGESLGWMC (SEQ ID N0:58). 

Moreover, the following SH2-binding peptides can be 
used according to the methods of the present invention to 

- 91 - 



wo 96/31625 



PCTA)S96/04454 



identify SH2 domain-containing polypeptides: GDGYEEISP (SEQ ID 
N0:59) (for src family), GDGYDEPSP (SEQ ID N0:60) (for Nek), 
GDGYDHPSP (SEQ ID N0:61) (for Crk) , GDGYVIPSP (SEQ ID NO: 62) 
(PLC7N), GDGYQNYSP (SEQ ID NO: 63) (for PLC7C) , GDGYMAMSP (SEQ 
5 ID NO: 64) (for P85PI3KN and p85PI3KC) , or GDGQNYSP (SEQ ID 
NO:65) (for Grb2) . See, Yang, Cell 72:767-778, the complete 
disclosure of which is incorporated by reference herein. 

Further, polypeptides with a "PH" functional domain 
(analogous to the proteins Vav, Bcr, Msos, PLC5, Atk, or 
iO Pleckstrin) can be identified using PH-binding peptides, such 
as those described by Mayer et al.. Cell 73:629-630, the 
complete disclosure of which is incorporated by reference 
herein. 

Other recognition units can be readily contemplated, 
15 including other synthetic, semisynthetic, or naturally derived 
molecules. 

The present invention is not to be limited in scope 
by the specific embodiments described herein. Indeed, various 
modifications of the invention in addition to those described 
20 herein will become apparent to those skilled in the art from 
the foregoing description and accompanying figures. Such 
modifications are intended to fall within the scope of the 

appended claims. 

Various publications are cited herein, the 
25 disclosures of which are incorporated by reference in their 
entireties. 



30 



35 
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WHAT IS CLAIMED IS: 

1. A method of identifying a polypeptide comprising a 
functional domain of interest comprising: 

(a) contacting a multivalent recognition unit 
5 complex with a plurality of polypeptides; and 

(b) identifying a polypeptide having a selective 
binding affinity for said recognition unit complex. 

2. The method of claim 1 in which said plurality of 
10 polypeptides is from a polypeptide expression library. 

3. The method of claim 1 in which said plurality of 
polypeptides is obtained from a virois. 

15 4 . The method of claim 2 in which said expression 

library is a cDNA expression library, 

5. The method of claim 2 in which said expression 
library is a genomic DNA library. 

20 

6. The method of claim 2 in which said expression 
library is a recombinant bacteriophage library. 

7. The method of claim 6 in which said recombinant 
25 bacteriophage library is a recombinant M13 library. 

8. The method of claim 2 in which said expression 
library is a recombinant plasmid or cosmid library. 

30 9. The method of claim 1 in which the recognition unit 

is a peptide. 

10. The method of claim 1 in which said recognition unit 
is a peptide having less than about 140 amino acid residues. 

35 

11. The method of claim 1 in which said recognition unit 
is a peptide having less than about 100 amino acid residues. 
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12* The method of claim 1 in which said recognition unit 
is a peptide having less than about 70 amino acid residues. 

13. The method of claim 1 in which said recognition unit 
5 is a peptide having about 6 to 60 amino acid residues. 

14. The method of claim 1 in which said recognition unit 
is a peptide having 20 to 50 amino acid residues. 

10 15. The method of claim 1 in which the valency of the 

recognition unit in the complex is at least two. 

16. The method of claim 9 in which the valency of the 
recognition unit in the complex is at least two. 

15 

17. The method of claim 1 in which the valency of the 
recognition unit in the complex is at least four. 

18. The method of claim 9 in which the valency of the 
20 recognition unit in the complex is at least four. 

19. The method of claim 17 in which the recognition unit 
complex is a complex comprising (a) avidin or streptavidin, 
and (b) biotinylated recognition units. 

25 

20. The method of claim 18 in which the recognition unit 
complex is a complex comprising (a) avidin or streptavidin, 
and (b) the biotinylated peptides. 

30 21. The method of claim 2 in which said identifying step 

comprises selecting a positive clone, which harbors a DNA 
construct encoding a polypeptide having a selective affinity 
for said recognition unit and which polypeptide includes the 
functional domain of interest or a functional equivalent 

35 thereof. 
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22. The method of claim 21 which further comprises 
determining the coding sequence of said DNA construct. 

23. The method of claim 22 which further comprises 

5 deducing an amino acid sequence from said coding sequence. 

24. The method of claim 1 in which said contacting step 
comprises immobilizing said recognition unit complex on a 
solid support and bringing a solution containing said 

10 plurality of polypeptides in contact with said immobilized 
recognition unit complex. 

25. The method of claim l in which said contacting step 
comprises separating said plurality of polypeptides and 

15 bringing a solution of said recognition unit complex in 
contact with said separated polypeptides. 

26. The method of claim 1 in which said identifying step 
includes selecting a polypeptide, among said plurality of 

2 0 polypeptides, having a selective affinity for said recognition 
unit and determining the amino acid sequence of said 
polypeptide. 

27. The method of claim 1 in which said plurality of 
25 polypeptides is immobilized on a solid support. 

28. The method of claim 27 in which said contacting step 
comprises contacting said solid support with a solution 
containing said recognition unit complex. 

30 

29. The method of claim 28 which further comprises 
washing away any unbound recognition unit complex. 

30. The method of claim 29 which further comprises 

35 detecting any recognition unit complex that remains bound to 
said solid support. 
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31. The method of claim 1 in which said selective 
binding affinity is on the order of about l nM to about 1 inM. 

32. The method of claim 1 in which said selective 

5 binding affinity is on the order of about 10 nM to about 100 
MM. 

33. The method of claim l in which said selective 
binding affinity is on the order of about 100 nm to about 10 

10 MM. 

34. The method of claim 1 in which said selective 
binding affinity is on the order of about 100 nm to about 1 
MM. 

15 

35. The method of claim 9 in which said peptide is 
chosen from a random peptide library. 

36. A method of identifying a polypeptide comprising a 
20 functional domain of interest comprising: 

(a) contacting a multivalent recognition unit 
complex, which complex comprises (i) avidin or streptavidin , 
and (ii) biotinylated recognition units, with a plurality of 
polypeptides from a cDNA expression library, in which the 

25 recognition unit is a peptide having in the range of 6 to 60 
amino acid residues; and 

(b) identifying a polypeptide having a selective 
binding affinity for said recognition unit complex. 

30 37. The method of claim 4 or 36 in which the cDNA 

expression library is a human cDNA expression library. 

38. The method of claim 36 in which the peptide is 
previously identified by a method comprising screening a 
35 random peptide library to identify -a peptide having selective 
binding affinity for the functional domain of interest or a 
functional equivalent thereof. 
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39. The method of claim 36 in which the functional 
domain of interest is a domain selected from the group 
consisting of an SHI, SH2| SH3, PH, PTB, LIM, armadillo, 
Notch/ankyrin repeat, zinc finger, leucine zippers, and helix- 

5 turn -helix. 

40. The method of claim 1 in which the functional 
domain of interest is a domain selected from the group 
consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 

10 Notch/ankyrin repeat, zinc finger, leucine zipper, and helix- 
turn-helix. 

41. The method of claim 1, 37, or 38 in which the 
functional domain of interest is an SH3 domain. 

15 

42. A method of identifying a polypeptide comprising an 
SH3 domain of interest comprising: 

(a) contacting a multivalent recognition unit 
complex, which complex comprises (i) avidin or streptavidin, 
20 and (ii) biotinylated recognition units, with a plurality of 
polypeptides from a cDNA expression library, in which the 
recognition unit is a peptide having in the range of 6 to 60 
amino acid residues and which selectively binds an SH3 domain; 
and 

25 (b) identifying a polypeptide having a selective 

binding affinity for said recognition unit complex. 

43. The method of claim 1 in which the functional domain 
of interest comprises a catalytic site. 

30 

44. The method of claim 43 in which said catalytic site 
corresponds to that found in glutathione S-transf erase. 

45. A method of identifying a polypeptide comprising a 
35 functional domain of interest or a 'functional equivalent 

thereof comprising: 
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(a) screening a random peptide library to identify a 
peptide that selectively binds a functional domain of 

interest; and 

(b) screening a cDNA or genomic expression library 
5 with said peptide or a binding portion thereof to identify a 

polypeptide that selectively binds said peptide. 

46. The method of claim 45 in which the screening step 
(b) is carried out by use of said peptide in a multivalent 

10 peptide complex. 

47. The method of claim 46 in which the screening step 
(b) is carried out by use of said peptide in a complex 
comprising streptavidin and biotinylated peptide. 

15 

48. The method of claim 46 in which the screening step 
(b) is carried out by use of said peptide in the form of 
multiple antigen peptides (MAP) . 

20 49. The method of claim 46 in which the screening step 

(b) is carried out by use of said peptide cross-linked to 
bovine serum albumin or keyhole limpet hemocyanin. 

50. A method of identifying a polypeptide comprising a 
2S functional domain of interest or a functional equivalent 
thereof comprising: 

(a) screening a random peptide library to identify a 
plurality of peptides that selectively bind a functional 
domain of interest; 
30 (b) determining at least part of the amino acid 

sequences of said peptides; 

(c) determining a consensus sequence based upon the 
determined amino acid sequences of said peptides; and 

(d) screening a cDNA or genomic expression library 
35 with a peptide comprising the consensus sequence to identify a 

polypeptide that selectively binds said peptide. 
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51. The method of claim 50 in which the screening step 
(d) is carried out by use of said peptide in a multivalent 
peptide complex. 

5 . 52. A method of identifying a polypeptide comprising a 
functional domain of interest or a functional equivalent 
thereof comprising: 

(a) screening a random peptide library to identify a 
first peptide that selectively binds a functional domain of 

10 interest; 

(b) determining at least part of the amino acid 
sequence of said first peptide; 

(c) searching a database containing the amino acid 
sequences of a plurality of expressed natural proteins to 

15 identify a protein containing an amino acid sequence 

homologous to the amino acid sequence of said first peptide; 
and 

(d) screening a cDNA or genomic expression library 
with a second peptide comprising the sequence of said protein 

20 that is homologous to the amino acid sequence of said first 
peptide. 



53. An assay kit comprising in one or more containers: 

(a) a purified polypeptide containing a functional 
25 domain of interest, in which the functional domain of is a 

domain selected from the group consisting of an SHI, SH2, SH3, 
PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, 
leucine zipper, and helix-turn*helix; and 

(b) a purified recognition unit having a selective 
30 binding affinity for said functional domain in said 

polypeptide. 

54. The assay kit of claim 53 in which said polypeptide 
comprises an amino acid sequence selected from the group 

35 consisting of SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 32, 
38, 40, 190, 192, 194, 196, 198, 200, and 221. 
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55. The assay kit of claim 53 in which said polypeptide 
comprises an amino acid sequence selected from the group 
consisting of SEQ ID NOs:113-115, 118-121, 125-128, 133-139, 
204-218, and 219. 

5 

56. The assay kit of claim 53 in which said recognition 
unit is a peptide. 

57. The assay kit of claim 53 in which said polypeptide 
10 or recognition unit is labeled. 

58. The assay kit of claim 57 in which said polypeptide 
or recognition unit is labeled with an enzyme. 

15 59. The assay kit of claim 57 in which said polypeptide 

or recognition unit is labeled with an epitope. 

60. The assay kit of claim 57 in which said polypeptide 
or recognition unit is labeled with a chromogen. 

20 

61. The assay kit of claim 57 in which said polypeptide 
or recognition unit is labeled with biotin. 

62. The assay kit of claim 53 in which said polypeptide 
25 or recognition unit is immobilized on a solid support. 

63. An assay kit comprising in containers; 

(a) a plurality of purified polypeptides, each 
polypeptide in a separate container and each polypeptide 

30 containing a functional domain of interest in which the 
functional domain of interest is a domain selected from the 
group consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 
Notch/ankyrin repeat, zinc finger, leucine zipper, and helix- 
turn-helix; and 

35 (b) at least one recognition unit having a 

selective binding affinity for said functional domain in each 
of said plurality of polypeptides. 
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64 • An assay kit comprising in one or more containers: 
(a) a plurality of purified polypeptides, each 
polypeptide in a separate container and each polypeptide 
containing an SH3 domain; and 
5 (b) at least one peptide having a selective 

affinity for the SK3 domain in each of said plurality of 
polypeptides. 

65. A kit comprising a plurality of purified 
10 polypeptides comprising a functional domain of interest, each 
polypefptide in a separate container, and each polypeptide 
having a functional domain of a different secfuence but capable 
of displaying the same binding specificity. 

15 66. The kit of claim 65 in which the polypeptides have 

an amino acid sequence selected from the group consisting of: 
SEQ ID NO: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 
192, 194, 196, 198, 200, and 221. 

20 67. The kit of claim 65 in which the functional domain 

is an SH3 domain. 

68. The kit of claim 65 in which the functional domain 
is an SH3 domain from a polypeptide having an amino acid 

25 sequence selected from the group consisting of: SEQ ID NO: 8, 
10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 
198, 200, and 221. 

69. A method for screening a potential drug candidate 
30 comprising: 

(a) allowing at least one polypeptide comprising a 
functional domain of interest to come into contact with at 
least one recognition unit having a selective affinity for 
said functional domain in said polypeptide, in the presence of 
35 an amount of a potential drug candidate, such that said 
polypeptide and said recognition unit are capable of 
interacting when brought into contact with one another in the 
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absence of said drug candidate, and in which the functional 
domain of interest is a domain selected from the group 
consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 
Notch/ankyrin repeat, zinc finger, leucine zipper, and helix- 
5 turn-helix; and 

(b) determining the effect, if any, of the presence 
of the amount of said drug candidate on the interaction of 
said polypeptide with said recognition unit. 

10 70, The method of claim 69 in which the effect of the 

drug candidate upon multiple, different interacting ' 
polypeptide-recognition unit pairs is determined in which at 
least some of said polypeptides have a functional domain that 
differs in sequence but is capable of displaying the same 

15 binding specificity as the functional domain in another of 
said polypeptides. 

71. The method of claim 69 in which at least one of said 
at least one polypeptide or recognition unit contains a 

20 consensus functional domain and consensus recognition unit, 
respectively. 

72. The method of claim 69 in which the polypeptide is a 
polypeptide identified by the method of claim 1. 

25 

73. The method of claim 69 in which the drug candidate 
is an inhibitor of the polypeptide-recognition unit 
interaction that is identified by detecting a decrease in the 
binding of polypeptide to recognition unit in the presence of 

30 such inhibitor. 

74. A purified polypeptide comprising an SH3 domain, 
said SH3 domain having an amino acid sequence selected from 
the group consisting of: SEQ ID NOs: 113-115, 118-121, 125-128, 

35 133-139, 204-218, and 219. 
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75. A purified polypeptide comprising an SH3 domain, 
said polypeptide having an amino acid seqpience selected from 
the group consisting of SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 
30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. 

5 

76. A purified DMA encoding an SH3 domain, said DNA 
having a secfuence selected from the group consisting of SEQ ID 
NOs: 7, 9, 11, 17, 19, 21, 23, 29, 31, 37, 39, 189, 191, 193, 
195, 197, 199, and 220. 

10 

77. A purified DNA encoding a polypeptide comprising an 
amino acid sequence selected from the group consisting of: SEQ 
ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 
194, 196, 198, 200, and 221. 

15 

78* A purified DNA encoding a polypeptide comprising an 
amino acid sequence selected from the group consisting of: SEQ 
ID NOs:113-115, 118-121, 125-128, 133-139, 204-218, and 219. 

2 0 79. A purified molecule comprising an SH3 domain of a 

polypeptide having an amino acid sequence selected from the 
group consisting of: SEQ ID NO: 8, 10, 12, 18, 20, 22, 24, 30, 
32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. 

25 80. A fusion protein comprising (a) an amino acid 

sequence comprising an SH3 domain of a polypeptide having the 
amino acid sequence of SEQ ID NO: 8, 10, 12, 18, 20, 22, 24, 
30, 32, 38, 40, 190, 192, 194, 196, 198, 200, or 221 joined 
via a peptide bond to (b) an amino acid sequence of at least 

30 six amino acids from a different polypeptide. 

81. A purified DNA encoding the fusion protein of claim 

80. 

35 82. A nucleic acid vector comprising the DNA of claim 

81. 
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83. A nucleic acid vector comprising the DNA of claim 

76. 

84. A nucleic acid vector comprising the DNA of claim 

5 78. 

85. A recombinant cell containing the nucleic acid 
vector of claim 82, 83, or 84. 

{ 

10 86. A purified nucleic acid hybridizable to a nucleic 

acid having a sequence selected from the group consisting of: 
SEQ ID NOs: 7, 9, 11, 17, 19, 21, 23, 29, 31, 37, 39, 189, 
191, 193, 195, 197, 199, and 220. 

15 87. A method of producing the fusion protein of claim 80 

comprising culturing a recombinant cell containing a nucleic 
acid vector encoding said fusion protein such that said fusion 
protein is expressed, and recovering the expressed fusion 
protein. 

20 

88. A method of producing the polypeptide of claim 74 
comprising culturing a recombinant cell containing a nucleic 
acid vector encoding said polypeptide such that said 
polypeptide is expressed, and recovering the expressed 

25 polypeptide. 

89. The method of claim 69 in which said polypeptide is 
a polypeptide containing an SH3 domain produced by a method 
comprising: 

30 (i) screening a peptide library with an SH3 domain 

to obtain one or more peptides that bind the SH3 domain; 

(ii) using one of the peptides from step (i) to 

screen a source of polypeptides to identify one or more 

polypeptides containing an SH3 domain; 
35 (iii) determining the amino acid sequence of the 

polypeptides identified in step (ii) ; and 
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(iv) producing the one or more novel polypeptides 
containing an SK3 domain. 

90. The method of claim 69 in which said polypeptide is 
5 a polypeptide containing an SH3 domain produced by a method 

comprising: 

(i) screening a peptide library with an SH3 domain 
to obtain a plurality of peptides that bind the SH3 domain; 

(ii) determining a consensus sequence for the 
10 peptides obtained in step (i) ; 

(iii) producing a peptide comprising the consensus 

sequence ; 

(iv) using the peptide comprising the consensus 
sequence to screen a source of polypeptides to identify one or 

15 more polypeptides containing an SH3 domain; 

(V) determining the amino acid sequence of the 
polypeptides identified in step (iv) ; and 

(vi) producing the one or more polypeptides 
containing an SH3 domain. 

20 

91. A method of determining the potential 
pharmacological activities of a molecule comprising: 

(a) contacting the molecule with a compound 
comprising a functional domain under conditions conducive to 

25 binding; 

(b) detecting or measuring any specific binding that 
occurs ; and 

(c) repeating steps (a) and (b) with a plurality of 
different compounds, each compound comprising a functional 

30 domain of different sequence but capable of displaying the 
same binding specificity. 

92. The method of claim 91 in which the functional 
domain is an SH3 domain. 

35 

93. The method of claim 92 in which the compounds 
comprise the SH3 domains of Src, Abl, cortactin, Phospholipase 
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C7, Nek, Crk, p53bp2, Amphiphysin, Grb2, RasGap, or 
Phosphatidylinositol 3* kinase, 

94. A method of identifying a compound that affects the 
5 binding of a molecule comprising a functional domain to a 
recognition unit that selectively binds to the functional 
domain comprising: 

(a) contacting the molecule comprising the 
functional domain and the recognition unit under conditions 

10 conducive to binding in the presence of a candidate compound 
and measuring the amount of binding between the molecule and 
the recognition unit and in which the functional domain of 
interest is a domain selected from the group consisting of an 
SHI, SH2, SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, 

15 zinc finger, leucine zipper, and helix-turn-helix; 

(b) comparing the amount of binding in step (a) with 
the amount of binding known or determined to occur between the 
molecule and the recognition unit in the absence of the 
candidate compound, where a difference in the amount of 

20 binding between step (a) and the amount of binding known or 
determined to occur between the molecule and the recognition 
unit in the absence of the candidate compound indicates that 
the candidate compound is a compound that affects the binding 
of the molecule comprising a functional domain and the 

25 recognition unit, 

95, The method of claim 94 in which the functional 
domain is an SH3 domain. 

30 96, The method of claim 2 0 in which the recognition unit 

complex is a complex comprising (a) streptavidin conjugated to 
alkaline phosphatase; and (b) the biotinylated peptides. 

97. A method of identifying a polypeptide comprising a 
35 functional domain of interest comptising: 
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(a) contacting a recognition unit that is a peptide 
having 14 0 amino acids or fewer vith a plurality of 
polypeptides; and 

(b) identifying a polypeptide having a selective 
5 binding affinity for said recognition unit complex. 

98. An antibody to a polypeptide comprising an amino 
acid sequence selected from the group consisting of: SEQ ID 
NOs:113-115, 118-121, 125-128, 133-139, 204-218, and 219. 

10 

99. An antibody to a polypeptide comprising an amino acid 
sequence selected from the group consisting of SEQ ID NOs: 8, 
10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 
198, 200, and 221. 

15 

100. The purified nucleic acid of claim 86 that is a 
human nucleic acid encoding a polypeptide containing a 
functional domain. 

2 0 101. A purified protein encoded by a first nucleic acid 

comprising a human cDNA or genomic sequence hybridizable to a 
second nucleic acid having a sequence selected from the group 
consisting of: SEQ ID N0s:7, 9, 11, 17, 19, 21, 29, and 31. 

25 102. The assay kit of claim 53 in which said polypeptide 

comprises an amino acid sequence selected from the group 
consisting of SEQ ID N0s:6, 14, 16, 26, 28, 34, 36, 112, 116, 
117, 122-124, 129-132, and 140. 



35 
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1 GTGAATGCTG CAGACAGTGA CG6ATGGACA CCACTGCATT 
41 GTGCTGCCTC TTGCAACA6T GTCCACCTCT GCAAGCAGCT 
81 GGTGGAAAGT GGAGCCGCTA TCHTGCCTC CACCATCAGT 
121 GACAHGAGA CT6CTGCAGA CAAGTGT6AA GAGATGGAAG 
161 AGGGATACAT CCAGIGHCC CAGHTCIGT ATGGGGTACA 
201 AGAGAAGCTG GGAGTGATGA ACAAAGGCAC CGTGTATGCT 
241 nGTGGGACT ACGAGGCCCA 6AACAGCGAT GAGCTGTCCT 
281 TCCATGAAGG GGATGCCATC ACCATCCTGA GGCGCAAAGA 
321 TGAAAACGAG ACCGAGTGGT GGTGGGCTCG TCHGGGGAC 
361 CGGGAGGGCT ACGTGCCCAA AAACHGCTG GGGHGTATC 
401 CACGGATCAA ACCCCGGCAG CGAACACTTG CCT6AACCCC 
441 CTGGAGTACC ACAGTCTCGT HGCTCCCAG GA6CTACTGG 
481 AGGAGATCCC ACTGCCCTGG GAAAACTGAA GCTAGGATGG 
521 TCTCCTGGTG CTCACTHAG CAGACAGTGT CCACAATGTG 
561 AATCCCACTT CCCAGGTGAG GCCCTCTCCA GGCTGCAGGA 
601 GCTGG 



FIG. 18 



1 VNAADSDGWT PLHCAASCNS VHLCKQLVES GAAIFASTIS 
41 DIETAADKCE EMEEGYIQCS QFLYGVQEKL GVMNKGTVYA 
81 LWDYEAQNSD ELSFHEGDAI TILRRKDENE TEWWWARLGD 
121 REGYVPKNLL GLYPRIKPRQ RTLA 

FIG. 19 



1 SGCARSGAAA ASAGLAPSCR VRVGLPRLSL VAPCSAMSKP 
41 PPKPVKPGQV KVFRALYTFE PRTPDELYFE EGDIIYITDM 
81 SDTSWWKGTC KGRTGLIPSN YVAEQAESID NPLHEAAKRG 
121 NLSWLRECLD NRVGVNGLDK AGSTALYWAC HGGHKDIVEV 
161 LFTQPNVELN QQNKLGOTAL HAAAWKGYAD IVQLLLAKGA 
201 RTDLRNNEKK LALDMATNAA CASLLKKKQQ GTDGARTLSN 
241 AEDYLDDEDS D 
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1 . .GAATTCAA GCTCGGGHG CGC6CGGTCC GGAGCGGCCG 
41 CGGCCAGCGC AG6CTTGGC6 CCCAGHGTC GTGTGC6T6T 
81 6GGGCTCCCG CGGCTGAGCC TGGTCGCTCC GTGTAGCGCC 
121 ATGTCCAAGC CACCTCCCAA ACCGGTCAAA CCAGGGCAA6 
161 HAAAGTCTT CAGAGCTCTA TATACAHTG AACCCAGAAC 
201 TCCAGATGAA TTATACTTTG AAGAAGGAGA CAHATCTAC 
241 ATCACTGACA TGAGTGATAC CAGCTGGTGG AAAGGGACAT 
281 GCAAGGGCAG AACAGGACTG ATCCCGAGCA ACTATGTGGC 
321 TGAGCAGGCA GAATCCATTG ACAATCCATT GCATGAAGCT 
361 GCAAAAAGAG GCAACCTGAG CIGGHGAGG GAGTGCTTG6 
401 ACAACCGGGT GGGTGTGAAC GGCCTGGACA AAGCTGGAAG 
441 CACAGCCCTG TACTGGGCCT GCCACGGTGG CCATAAAGAC 
481 ATAGTGGAGG TTCIGTHAC TCAGCCGAAT GTGGAGCTGA 
521 ACCAGCAGAA TAAGCTGGGA GACACAGCTC TGCACGCGGN 
561 TGCCTGGAAG GGTTATGCAG ACAHGTCCA GTTGCTACTG 
601 GCAAAAGGTG CGAGGACAGA CHGAGAAAC AATGAGAAGA 
641 AGCTGGCCn GGACATGGCC ACCAACGCTG CCTGTGCATC 
681 GCTCCTGAAG AAGAAGCAGC AGGGAACAGA TGGGGCTCGA 
• 721 ACGTTAAGCA ACGCCGAGGA CTACCTCGAT GACGAAGACT 
761 CAGACTGAH CCCCCCGGGG CCGCTTTGAT TGHGCCTAA 
801 ACnCnTTG CTTTTGCCAT TCCGGAGCCT GGGHGHTG 
841 CCAGAAGAGT AHGATAACT GTTGCTTTTA AAGTCTGTAT 
881 GAGCGCGACA CTGCTGCACT GTGATCTGTG AGGAGTCGH 
921 GTGAGGGTGG CTCAHCTCA CCCACGCCTT GNCAATAAGT 
961 GAAGAGATAC THGnGTAT AAAATACATA TATGCTCACC 
1001 AGG6TAAAAT AAACGAAAAA AANTTATTTC TAHTATCAA 
1041 GCTAAAAAAA AAAAGCHGG GCCCTNTTCT ATAGTGTCAC 
1081 CTAAATACTA GCTTGANCCG GNTGCTAACA AAGCCCGAAA 
1121 GGAAGCTGAG HGCTGCTGC CACCGNTGAG CAATAACTAG 
1161 CATANCCCCT TGGGGCCTCT AAACGGGTCT TGAGGGGHT 
1201 HNGNTGAAA GGAGGANCTA TTTCCGGATA ACCTGGNGTA 
1241 ATAGGGAAGA GGCCC6NACC GATCGCCCH CCCAACAGA 

FIG. 20 
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1 .ACTCACGNC 6GT66AGTGG TACCGGATCG AATTCAAGCC GCATCACTGG 
51 CACTGGACGC CAGGGCATCT TCCCTGCCAG CTACGTGCAG ATAAACCGAG 
101 AGCCCCGGCT CAGGCITTGT GATGATGGTC CCCAGCTCCC TGCATCACCT 
151 AACCCGACAA CCACTGCTCA CCTAAGCAGC CACTCCCACC CCTCCTCAAT 
201 ACCT6TGGAC CCCACT6ACT 6G66AGGTC6 AACCTCCCCT C6ACGCTCCG 
251 CCTTTCCCTT CCCCATCACC CTCCAGGAGC CCAGATCCCA AACCCAGAGT 
301 CTCAATACCC CTGGACCAAC CCTGTCCCAT CCTCGAGCCA CCAGCCGTCC 
351 CATAAACCTG GGACCCTCCT CCCCAAACAC AGAGATACAC TGGACTCCGT 
401 ACCGG6CCAT GTACCAGTAC AGGCCCCA6A ATGAGGAC6A GCTGGAACn 
451 C6AGAGQGGG ACC6TGTGGA T6TCATGCAG CAAT6TGACG ATGGCTGGTT 
501 TGTGGGTGTC TCCCGGCGAA CTCAGAAATT IGGGACAHC CCTGGAAATT 
551 ATGTAGCCCC AGTGTGAGTG GTCTCCATGG CAGTHGGAG CCAACGAGGA 
601 TCGGGAGGG6 AGCAGTAGCA CTATGGGAGG GAGAGAGGCC TTCCATAGCC 
651 TCCTCCCCAG GACCTGTGCT CCCAGCHCT GCAGAGACCC CAGCAACTTT 
701 CCCTCCAAGC CTCCTTGAAG TCCGATTCCC ACCCCGCAAG TCACAGGCAT 
751 TCCTTTGACA GCCCCCHCA CCGCCCCTCA AATACAGACA TCTGCTTTCA 
801 TGTGGGNAAA AAAAAAAAAT TAAAAGGTGG CCCTAT 

FIG. 22 



1 RITGTGRQGI FPASYVQINR EPRLRLCDD6 PQLPASPNPT 
41 TTAHLSSHSH PSSIPVDPTD WGGRTSPRRS AFPFPITLQE 
81 PRSQTQSLNT PGPTLSHPRA TSRPINLGPS SPNTEIHWTP 
121 YRAMYQYRPQ NEDELELREG DRVDVMQQCD DGWFVGVSRR 
161 TQKFGTFPGN YVAPV 

FIG. 23 



1 MSVAGLKKQF HKATQKVSEK VGGAEGTKLD DDFKEMERKV 
41 DVTSRAVMEI MTKTIEYLQP NPASRAKLSM INTMSKIRGQ 
81 EKGPGYPQAE ALLAEAMLKF GRELGDDCNF GPALGEVGEA 
121 MRELSEVKDS LDMEVKQNFI DPLQNLHDKD LREIQHHLKK 
161 LEGRRLDFGY KKKRQGKIPD EELRQALEKF DESKEIAESS 
201 MFNLLEMDIE QVSQLSALVQ AQLEYHKQAV QILQQVTVRL 
241 EERIRQASSQ PRREYQPKPR MSLEFATGDS TQPNGGLSHT 
281 GTPKPPGVQM DQPCCRALYD LEPENEGELA FKEGDIITLT 
321 NQIOENWYEG MLHGQSGFFP INYVEILVAL PH 

FIG. 25 
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1 TTNNNNYYMM SKYSKKGKKK 
51 GGCCCGGCGA CCCCA6CCGC 
101 CGC6CAGCCT CCC6CATCCC 
151 TTCCACAAAG CCACTCAGAA 
201 CACCAAGCTC GAT6AT6ACT 
251 CCAGCAGGGC TGTGAT6GAG 
301 CCCAATCCA6 CHCCAGGGC 
351 AATCCGCGGC CAAGAGAAGG 
401 TGGCAGAGGC CATGCTCAAG 
451 nTGGTCCTG CTCTCG6TGA 
501 GGTCAAGGAC TCAnGGACA 
551 TTCAGAATCT TCATGACAAG 
601 AAGCTGGAAG GCCGACGCH 
651 CAAGATTCCA GATGAAGAAC 
701 CTAAAGAAAT CGCCGAGTCG 
751 6AACAG6TGA 6CCAGCTCTC 
801 CAAGCAGGCA GTGCAGATCC 
851 GAATAAGACA AGCTTCATCT 
901 CGGATGAGCC TAGAGHTGC 
951 TCTCTCCCAC ACAGGCACAC 
1001 CCT6CTGCCG AGCTCTGTAT 
1051 GCniTAAAG AGGGCGATAT 
1101 CTGGTATGAG GGGATGCHC 
1151 ATGTAGAAAT TCTGGTTGCT 
1201 TCACCTCCn CTGACCCAGA 
1251 CTGCnCCAA TACATCACGA 
1301 CCACACGTGC CCTGGGHGA 
1351 AGATGGTATC TTCCAAGGCC 
1401 TCTGAGACTA GCCAGGAGTC 
1451 TGTQGnCCT 6GTAACATGC 
1501 CACT6AAGAT ATTGTCTCTC 
1551 TCCATTTACA GAGGAGAAAG 
1601 ATGTGAGTCA CAGAATTGTT 
1651 CTGCTGCnT AAGCAACHG 
1701 HGTCCAAAG CACCHTGIT 
1751 TATCTATCTA TCTATCTATC 
1801 CTATCATCTA TCTATCTATC 
1851 NNTCNATCTA TCTATCTATC 
1901 CTATCTACTA TCCATCTATC 
1951 CTACTATCCA TCCATTTATC 
2001 TCTCCCTCAT ACTTCTGAGA 
2051 AAGCACHGG NAGATGAGGG 
2101 GGT6AGCAGG GTGTATGHG 
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KGKWMSGRTC GATTCAAGCC GACCAGCGGC 
CTCTCCGCAT CTGCATCTGC ATCT6CCGGC 
ATCATGTCGG TGGCAGGGCT GAAGAA6CAG 
AGTGAGTGAG AAGGTGGGAG GAGCGGAAGG 
TCAAAGAGAT GGAGAGGAAA GTGGATGTCA 
ATAATGACAA AAACGATTGA ATACCTCCAA 
TAAGCTCAGT ATGATCAACA CCAT6TCGAA 
GGCCAGGCTA CCCTCAGGCG GAA6CACTGC 
nCGGCAGGG AGCTGGGTGA TGAHGCAAC 
GGTGGGAGAA GCCATGAGGG AGCTCTCGGA 
TGGAAGTGAA GCAGAATnC ATCGACCCCC 
GATCTGAG6G AGAHCAGCA TCATCTGAAA 
AGACTHGGT TATAAGAAGA AGCGACAAG6 
TCCGCCAAGC TCTGGAGAAA HCGATGAGT 
AGCATGTTCA ACCTCHGGA GATGGATATA 
CGCACTTGTT CAGGCTCAGC TGGAGTACCA 
TGCAGCAGGT CACTGTCAGA CTGGAAGAAA 
CAGCCAAGAA GGGAATATCA GCCCAAACCA 
CACTG6AGAC A6TACTCAGC CCAACGGGG6 
CCAAACCTCC AGGTGTCCAA ATGGATCAGC 
GACnGGAAC CTGAAAATGA AGGGGAAHG 
CATCACACTC ACTAATCAGA HGACGAGAA 
ATGGCCAGTC TGGCTTTTTC CCCATCAACT 
CT6CCCCATT AGGATCCTGT GCTGGCTGGC 
TAGHAAGTT TAACCACTGC HTGGTAATG 
ATGCAGGCCG CAGTGGATGA GTCACCAAGC 
CCCGTGTGCT CCTCCAGGAG ACGCGGTGAT 
AGTGGGCCTG GTACATGCTT TAAAACACCA 
CCAGAACTGG CTTCACAGTT CTCAGGAGGC 
CTGTGAACCA CATGGCAGAA AAACTCTCCT 
ACCCAGGGGC CATCTCAAGG TCTCCAGHC 
TCCTTTTTGT TGCACTTTCC CTTCCTAAAT 
GGCAAAAACA TCCCCTCACC AGCAAGATGT 
GTCTCHGAT GCCAHAGCA AAAGTAHAA 
CACTAATATC TATCTATCTA TCTATCTATC 
TATCTATCAT CTATCTACCT ACCTATCTAC 
ATCTAHATC TATCTATCTA TCTATCTATC 
CATCTATCTA TCCATCATCT ATCTACCTAC 
TATCTATCCA TCATCTATCT ACCTACCTAT 
TATCTATCTA TCTATCTATC TATCTATCTA 
CATGGCCAGT TTTCTTCCCT CCCTGCTGH 
GGGGGGTCCC ATTTNATnC TGAGTGAGAT 
GCTGTNNTNN GGGGGTGGCC CTA 
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1 CGGGCGCGGC GGGAGCCTGG 
41 ATCAHGATC ATCGCAGATG 
81 TCCTTGGATT CGGACAGACT 
121 AAAGAGAAAG ATGTGGAAAG 
161 TCCATCACGC AGGATGATG6 
201 CTGATCCTGA nTTGTGAAT 
241 GAGATGGGGT GCTAAAACCG 
281 GAACACATCA ACAHCACAA 
321 AAGAACACCA GACGCTCAAG 
361 ACCCAAGGCT TCCCACGGCT 
401 GAGCA6GATA 6GATGGACAG 
441 ACCAGTCGAA GCTTTCCAAG 
481 GGTCCGGGGC HCGGAGGCA 
521 AGGGTGGATC AGTCTGCTGT 
561 AGACTGAGAA GCATGCCTCC 
601 CTTCGGTG6C AAATACGGTG 
641 AAGAGTGCCG TGGGCHTGA 
681 A6CATGAGTC TCAGAAAGAT 
721 CAAATATGGG AHGACAAGG 
761 GTGGGCTTTG AGTATCAAGG 
801 CCCAGAAAGA CTATGTAAAA 
841 T6TGCAGACA GACAGACAGG 
881 GACCATCAGG AGAAGCTGCA 
921 ACTATAAGAC TGGITTCGGA 
961 CGAGAGGCAG GACTCCTCCG 
1001 GAGAGAHGG CCAAGCACGA 
1041 AAGGAHCGG CGGGAAGTAT 
1081 GGACAAGAAT GCATCCACCT 
1121 CCATCTGCCT ATCAGAAGAC 
1161 CCAGCAAAAC CAGTAATATC 
1201 GGCAAAGGAG AGAGAGCAGG 
1241 GCCGAGAGAG CTCAGCGGAT 
1281 AGCAGGAG6C GCGCAGGAAG 
1321 CAAGAAGCAG ACGCCCCCTG 
1361 AHGAAGACA GACCACCCTC 
1401 CAGCTCCGH CAAGGCCGAG 
1441 ACCTGAGCCT GAGTACAGCA 
1481 GAGGCTGGCA GCCAGCAAGG 
1521 CCGTGTACGA GACTACAGAG 
1561 AGAGGATGAC ACCTACGATG 
1601 ATCACAGCCA TCGCCCTGTA 



TGGACCCTGC TTTGGCG6TA 
CCCTCATATC CACnTGGAT 
CTGAACT6CT TTTCCCAGCA 
CCTCTGCAGG CCATGCTGTG 
AGGAGCTGAT GACTGGGAGA 
GAT6TGAGT6 AAAAGGAGCA 
TGCAGGGATC GGGGCACCAG 
GCnCGAGAG AATGTCnCC 
GAGAAGGAGC TGGAAACGGG 
ATGGCGGGAA GHCGGTGTG 
ATCAGCCGTG GGCCATGAGT 
CACTGCTCAC AA6TGGACTC 
AGnCGGTGT CCAGATG6AC 
AGGCTHGAA TACCAGGGGA 
CAGAAAGACT ACTCTAGTGG 
TGCAAGCTGA CCGTGTAGAC 
CTACCAGGGC AAGACGGAGA 
TACTCCAAAG GTTTTGGTGG 
ACAAGGTGGA TAAAAGTGCT 
CAAGACAGAG AAGCACGAAT 
GGCTTTGGAG GAAAGTTTGG 
ACAAGTGTGC CCnGGCTGG 
GCTGCATGAA TCCCAAAAAG 
GGCAAATTTG GTGTTCAGTC 
CTGTGGGGTT TGATTACAAG 
GCCCCAGCAA GACTATGCCA 
GGGGTGCA6A AGGATCGGAT 
HGAAGAAGT GGTCCAGGTG 
TGTCCCCATT GAGGCCGTAA 
CGTGCTAACT HGAAAACCT 
AGGACAGGCG GAAGGCAGAA 
GGCCAAAGAA AGACAGGAGC 
CTGGAAGAGC AAGCCA6AGC 
CATCCCCTAG TCCTCAACCA 
CAGCCCCATC TATGAGGATG 
CCGAGCTACC GAGGTAGCGA 
TCGAGGCC6C AGGCATTCCT 
CCTGACCTAT ACATCAGAGC 
GCTCCTGGCC ACTATCAAGC 
GGTATGAGAG TGACCTGGGC 
TGACTACCAG GCTGCTGGCG 
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1641 ATGATGAGAT CTCCTTTGAC CCTGATGACA TCATCACCAA 
1681 CATAGAAATG AHGACGATG GCTGGTGGCG T6GGGTGTGC 
1721 AAQGGCA6AT ACGGGCTCH CCCAGCCAAC TATGTGGA6C 
1761 TGCGGCAGTA GGGCTGCCAC CCAGAGCCTA CCGGCACCAG 
1801 CACAGGGTTC ACACTACAGA GCATCTGCGT GIGTHGAGT 
1841 TGGTTTCTGC TTCCGTTTCT GinHG 
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1 MWKASAGHAV SITQDDGGAD OWETDPDFVN DVSEKEQRWG 
41 AKTVQGS6HQ EHINIHKLRE NVFQEHQTLK EKELETGPKA 
81 SHGYGGKFGV EQDRMDRSAV GHEYQSKLSK HCSQVDSVRG 
121 FGGKFGVQMD RVDQSAVGFE YQGKTEKHAS QKDYSSGFGG 
161 KYGVQADRVD KSAVGFDYQG KTEKHESQKD YSKGFGGKYG 
201 IDKDKVDKSA VGFEYQGKTE KHESQKDYVK GFGGKFGVQT 
241 ORQDKCALGW DHQEKLQLHE SQKDYKTGFG GKF6VQSERQ 
281 DSSAVGFDYK ERLAKHEPQQ DYAK6FGGKY GVQKDRMDKN 
321 ASTFEEVVQV PSAYQKTVPI EAVTSKTSNI RANFENLAKE 
361 REQEDRRKAE AERAQRMAKE RQEQEEARRK LEEQARAKKQ 
401 TPPASPSPQP lEDRPPSSPI YEDAAPFKAE PSYRGSEPEP 
441 EYSIEAAGIP EAGSQQGLTY TSEPVYETTE APGHYQAEDD 
481 TYDGYESDLG ITAIALYDYQ AAGDDEISFD PDDIITNIEM 
521 IDDGWWRGVC KGRYGLFPAN YVELRQ 

FIG. 27 



1 AAGCAGTCCT TCACCATGGT GGCCGACACT CCGGAAAACC TCCGCCTCAA 

51 GCAACAGAGC GAGCTGCAGA GTCAGGTGCG CTACAAGGAG GAGTTTGA6A 

101 AGAATAAGGG CAAAGGTTTC AGCGTGGTGG CAGACACGCC TGAGCTGCAG 

151 AGAATCAAGA AGACCCAGGA CCAGATCAGC AATATCAAAT ACCATGAGGA 

201 GTTTGAGAAG AGCCGCAT6G GGCCCAGTGG AGGAGAAGGG GTGGAACCAG 

251 AGCGCCGAGA AGCCCAGGAC AGCAGCAGCT ACCGGAGGCC CACAGAGCAG 

301 CAGCAGCCGC AGCCTCACCA TATCCCGACC AGTGCCCCCG TGTACCAGCA 

351 GCCCCAGCAG CAGCAGATGA CCTCGTCCTA TGGTGGGTAC AAG6AGCCAG 

401 CAGCCCCTGT CTCCATACAG CGCAGTGCCC CAGGTGGC6G T6GGAAACGG 

451 TACCGTGCAG TGTATGACTA CAGCGCTGCC GACGAGGACG AGGTCTCCTT 

501 CCAGGATGGG GACACCATCG TCAATGTGCA GCAGATCGAT GACGGCTGGA 

551 TGTACGGGAC CGTAGAGCGC ACCGGTGACA CGGGGATGCT GCCAGCCAAC 

601 TACGTGGAGG CCATCTGAAC CCTGTGCCGC CCCGCCCTGT CHCAATGCA 

651 TTCCATGGCA TCACATCTGT CCTGGGGCCT GACCCGTCCA CCCnCAGTG 

701 TCTCTGTCTT nAAGATCTT CAACT6CTTC TTTATCCCCG CCCCTCCAGC 

751 TTATTTTACC ATCCCAAGCC nGTTCTGCC CCTGTCATGG GCTCCTTCCT 

801 CTGGCAGGH TTCCCTTGGA CCAATCAACT GAnGATTTT TCTCTCTGGA 

851 TGGAACAGGC TGGGCACTCT GGGGAGGGCA GGAnGTTCT TAGCTAGGTA 

901 GAQCCCAGG GCTGGGCTGA ACTAGGAGAC CCACTAAGGA GATCAGTTTA 

951 GACTGGGTGC AGTGGCAAAC ACCCTTAAH CCCAGCGAAG GGAGTCAGAG 

1001 GCAGGCAGAT CTGIGACHG GAAGCCAGCC TGGTCTACAT CGAGAGTHC 

1051 AGGACAGCCA GAGCTATGTA GTGAGGCCCT GTCTCGGAGG AAGAGTGGGG 

1101 GTIGGHAGC TCTCAGCTTC ACHCCTGCC TTAGGCTCCT CAGAACCCCT 

1151 GGCCCAGCTC CCCCAACTCC CTTCCTCCTA GAGGTGGGGT GAGCTGTGC 
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1 KQSFTMVADT PENLRLKQQS ELQSQVRYKE EFEKNKGKGF SVVADTPELQ 
51 RIKKTQDQIS NIKYHEEFEK SRMGPS6GEG VEPERREAQD SSSYRRPTEQ 
101 QQPQPHHIPT SAPVYQQPQQ QQMTSSYGGY KEPAAPVSIQ RSAPGGGGKR 
151 YRAVYDYSAA DEOEVSFQDG DTIVNVQQID DGWMYGTVER TGDTGMLPAN 
201 YVEAI 



FIG. 29 



1 ATGGCGGTGA ACCTGAGCCG GAACGGGCCG GCGCTGCAGG AGGCCTACGT 

51 GCGCGTAGTC ACCGAGAAAT CCCCGACCGA CTGGGCTCTT TTTACCTATG 

101 AAGGCAACAG CAATGACATC CGTGTGGCTG GCACAGGAGA GGGAGGCCTG 

151 GAGGAGCTGG TGGAAGAGCT CAACAGCGGG AAGGTGATGT ACGCCTTCTG 

201 CAGGGTGAAG GACCCCAACT CCGGCCTGCC CAAGTTTGTC CTCATCAACT 

251 GGACAGGAGA GGGTGTGAAT GATGTGCGGA AAGGAGCATG TGCCAACCAC 

301 GTCAGCACCA TGGCCAACTT CCTGAAGGGT GCCCACGTGA CCATCAATGC 

351 CCGGGCCGAG GAGGATGTGG AGCCTGAGTG CATCATGGAG AAGGHGCCA 

401 AGGCCTCTGG GGCCAACTAC AGCTTCCATA AGGAAAGCAC CTCCTTCCAG 

451 GAT6TAGGGC CGCAGGCCCC AGTGGGCTCT GTGTACCAGA AGACCAATGC 

501 CATATCTGAG ATCAAGAGAG TCGGCAAGGA TAACTTCTGG GCCAAAGCTG 

551 AGAAGGAAGA AGAGAACCGC CGCCTGGAGG AGAAGCGGCG TGCCGAAGAG 

601 GAGCGGCAGC GGTTGGAGGA GGAGCGACGA GAGCGGGAGC TGCAGGAGGC 

651 TGCCCGACGT GAGCAGCGCT ACCAGGAACA GCACAGATCA GCTGGAGCCC 

701 CGAGCAGGAC AGGTGAGCCA GAGCAGGAAG CCGHTCAAG GACCAGACAG 

751 GAGTGGGAGT CTGCTGGGCA GCAGGCCCCA CACCCACGAG AGAHHCAA 

801 GCAGAAGGAA AGGGCAATGT CCACCACCTC TGTCACCAGC TCGCAGCCGG 

851 GCAAGCTGAG GAGCCCCTTC CTGCAGAAGC AACTCACTCA ACCAGAAACC 

901 TCCTACGGCC GAGAGCCCAC AGCTCCTGTC TCCCGGCCTG CAGCAGGTGT 

951 CTGTGAGGAG CCAGCGCCTA GCACTCTGTC TTCTGCCCAG ACAGAAGAAG 

1001 AACCTACATA TGAAGTACCC CCAGAGCAGG ACACCCTCTA TGAGGAACCA 

1051 CCACTGGTAC AGCAGCAAGG GGCTGGCTCC GAACACAHG ACAACTACAT 

1101 GCAGAGCCAG GGCHCAGTG GACAAGGGCT GTGCGCCCGG GCCHGTATG 

1151 ACTACCAGGC AGCTGATGAC ACCGAGATCT CCTHGACCC TGAGAACCTA 

1201 ATCACAGGCA TCGAGGTGAT TGACGAAGGC TGGTGGCGAG GCTATGGGCC 

1251 TGACGGCCAC HIGGCATGI HCCTGCCAA CTACGTGGAG CTCATAGAGT 

1301 GA 
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1 MAVNLSRNGP ALQEAYVRVV TEKSPTDWAL FTYEGNSNDI RVAGTGEGGL 
51 EELVEELNSG KVMYAFCRVK DPNSGLPKFV LINWTGEGVN DVRKGACANH 
101 VSTMANFLKG AHVTINARAE EDVEPECIME KVAKASGANY SFHKESTSFQ 
151 DVGPQAPVGS VYQKTNAISE IKRVGKDNFW AKAEKEEENR RLEEKRRAEE 
201 ERQRLEEERR ERELQEAARR EQRYQEQHRS AGAPSRTGEP EQEAVSRTRQ 
251 EWESAGQQAP HPREIFKQKE RAMSTTSVTS SQPGKLRSPF LQKQLTQPET 
301 SYGREPTAPV SRPAAGVCEE PAPSTLSSAQ TEEEPTYEVP PEQDTLYEEP 
351 PLVQQQGAGS EHIDNYMQSQ GFSGQGLCAR ALYDYQAADD TEISFDPENL 
401 ITGIEVIDEG WWRGYGPDGH FGMFPANYVE LIE 
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1 MSVAGLKKQF YKASQLVSEK VGGAEGTKLD DDFKDMEKKV DVTSKAVAEV 
51 LVRTIEYLQP NPASRAKLTM LNTVSKIRGQ VKNPGYPQSE GLLGECMVRH 
101 6KELGGESNF GDALLDA6ES MKRLAEVKDS LDIEVKQNFI DPLQNLCDKD 
151 LKIEQHHLKK LE6RRLDFDY KKKRQ6KIPD EELRQALEKF EESKEVAETS 
201 MHNLLETDIE QVSQLSALVD AQLDYHRQAV QILEELADKL KRRVREASSR 
251 PKREFKPRPR EPFELGELEQ PNGGFPCAPA PKITASSSFR SSDKPIRMPS 
301 KSMPPLDQPS CKALYDFEPE NDGELGFREG DLITLTNQID ENWYEGMLHG 
351 QSGFFPLSYV QVLVPLPQ 
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1 MAEMGSKGVT AGKIASNVQK KLTRAQEKVL QKLGKADETK DEQFEQCVQN 
51 FNKQLTEGTR LQKDLRTYLA SVKAMHEASK KLSECLQEVY EPEWPGRDEA 
101 NKIAENNDLL WMDYHQKLVD QALLTMDTYL GQFPDIKSRI AKRGRKLVDY 
151 DSARHHYESL QTAKKKDEAK lAKAEEELIK AQKVFEEMNV DLQEELPSLW 
201 NSRVGFYVNT FQSIAGLEEN FHKEMSKLNQ NLNDVLVSLE KQHGSNTFTV 
251 KAQPSDNAPE KGNKSPSPPP DGSPAATPEI RVNHEPEPAS GASPGATIPK 
301 SPSQPAEASE VVGGAQEPGE TAASEATSSS LPAVVVETFS ATVNGAVEGS 
351 AGTGRLDLPP GFMFKVQAQH DYTATDTDEL QLKAGDVVLV IPFQNPEEQD 
401 EGWLMGVKES DWNQHKELEK CRGVFPENFT ERVQ 
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1 HNNCACTCA CCGTCCGT66 
51 YKWKKCRRKS GC6GCGCCGA 
101 CGGCGCCGGC GGAAACCGGA 
151 GGCATGTCGG TGGCGGGGCT 
201 GGTCAGCGAG AAGGHGGTG 
251 TTAAAGATAT GGAAAAGAAG 
301 GTGCTGGTCA GAACCATAGA 
351 CAAGCTGACT ATGCTGAACA 
401 ACCCTGGCTA CCCACAGTCA 
451 CATGGCAAGG AACTAGGTGG 
501 TGCAGGTGAG TCCATGAAGC 
551 TCGAGGTCAA GCAGAACTTC 
601 GATCTGAAGG AGATCCAGCA 
651 TGACTHGAC TACAAGAAGA 
701 TGCGCCAGGC CCTAGAGAAG 
751 AGTATGCACA ACCTCCTGGA 
801 GGCCCTGGTG GATGCCCAGC 
851 T6GAGGAGCT GGCTGACAAG 
901 CGCCCCAAGC GGGAGHCAA 
951 AGAGCTGGAG CAGCCCAATG 
1001 TCACAGCCTC CTCATCAHT 
1051 AGCAAGAGCA TGCCACCCCT 
1101 TTTTGAGCCA GAGAATGATG 
1151 TCACGCHAC CAACCAGATC 
1201 GGCCAATCAG GCHCnCCC 
1251 GCCTCAGTGA CTGGGCCTTT 
1301 TCTAATGCCA AGGTGCTCTA 
1351 CTCCCCACTC CCTCA6CCCT 
1401 CCCACCCTGA GGHCTATTG 
1451 TCAACCCTTC CCAGCCCGTT 
1501 GCTCTCTAGA GGCAGGCAGG 
1551 CTGGCCAGCT CCCCAGCTCA 
1601 ATGAAGAAGT GCACAAGGCA 
1651 ACCCAGCTCA GCTCTGTTGA 
1701 CATTCCCAGG TCCCTGGCAC 
1751 nCATGTGAC TGAAGCTGAC 
1801 CAGCAGGCTA GGCTGGCCAT 
1851 CCAGTAATGG AGGCCTCCAG 
1901 TGCCCCATAA CCCGTGGCTT 
1951 CCCCTGGATA GGCAACACTG 
2001 TACTCCTAAT TTnmTAA 
2051 AAAANAAAGN TGGCCCTANN 
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TNNNNSTMMC SGWYNKRNTK YRRKMSSKRW 
CCTGCGCGCG GAG6AAAGAA GTCGGHCGG 
GHCGAGCGG GAGGCCTGAC GGCGGCAGGC 
GAAGAAGCAG TTCTACAA6G CGAGCCAGCT 
GGGCCGAAGG GACCAAACTG GATGATGACT 
GTGGATGTCA CCA6CAAG6C CGTGGCAGAG 
ATATCTGCAG CCTAACCCAG CCTCGAGAGC 
CCGTATCCAA GATCCGGGGC CAAGTGAAGA 
GAGGGTCTGT TGG6AGAGTG CAIGGHCGC 
AGAGTCCAAC nCGGTGATG CCCTGCTAGA 
GCCTGGCTGA GGTGAAGGAC TCACTGGACA 
ATTGACCCAC TACAGAACCT GTGTGACAAG 
CCACCTGAAG AAATTGGAGG GCCGCCGCCT 
AGCGCCAGGG CAAGATCCCC GATGAGGAGC 
TTCGAGGAGT CCAA6GAGGT GGCGGAGACC 
GACTGATATA GAGCAGGTGA GCCAGCTCTC 
TGGACTACCA CCGGCAGGCA GTGCAGATCC 
CTGAAGCGCA GGGTTCGGGA AGCCTCCTCA 
GCCCCGGCCC CGGGAGCCCT HGAGCTIGG 
GGGGAHCCC CTGTGCCCCA GCACCTAAGA 
AGATCGTCAG ACAAGCCCAT CAGGATGCCC 
GGACCAGCCA AGCTGCAAGG CGCTTTATGA 
GCGAGCTGGG CTTCCGTGAG GGG6ACCTCA 
GACGAGAACT GGTATGAGGG GAT6CTGCAC 
ACTCAGCTAC GTGCAGGTGC TGGTGCCTCT 
ACACCGCTGC CAGTCACAGT 6CAGCAGCAG 
GAAACACTAA TGHCCTCCA GGG6GGACTC 
GGGGCCCCCC TATCCTAAGA CTC6GAAAGG 
CCTTCCTGGT GGTATCAGCT TCCAGCTGTT 
GCTGGCGATG GSCCNNY6CC CCCTCTCTAG 
TCCTTGGAAT CCCCAGCCTG CAAGCAGAGG 
GCACACAGAC ACACCTGGCA CCT6CTGCTC 
CAAATGTGTA CACHCCCAT GGGACCACAG 
AGACCAAGCA CAAAGGCCH GAAGAGT6GA 
CTTCCCTTGA GCCAGCTCCA HGCTACTTA 
CACAGGCAGC TGGCAGGTCC mTTTCAAC 
AGACCCAGCT CTGCCTCACC CTGCCATGTT 
CCTGGGCTCT ATTACAHCT TCTCTACAGC 
ATCCCTGGCA CGTGGGGCCA CACCCCACGC 
TCCTGCTCCA GCCTGTGCTG ANATGAACTG 
AAAAAAAGTA HAAATNTCT CTTTCTATAT 
NGGA 
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1 CCTCACTCGC TCTCCCCGCG CACGCTCCGT CTCCGTCAGT CCCCTGAGCT 
51 GTTCTAGTGC 6CGGCGTGGA GCCAGGGCTC AGGCTGGTGG AGCGGCC6GG 
101 GCT6GAGGCT GG6AGTGCGG CGCGCACGGC CTCCCCGCGC CATTATCCGC 
151 6CTCGCTTCG GGCGAGGCCG GCGCCAGGAT G6CAGAGAT6 GGGAGCAAGG 
201 GGGTGACGGC GGGGAAGATC GCCAGCAACG TACAGAAGAA GCTGACCCGA 
251 GCGCAGGAGA AGGTCCTGCA GAAACTGGGG AAGGCGGACG AGACGAAGGA 
301 CGAGCAGTTT GAGCAGTGTG TCCAGAACH CAATAAGCAG CTGACAGAGG 
351 GTACCCGGCT GCAGAAGGAT CHCGGACCT ATCTG6CTTC TGTTAAAGCG 
401 ATGCACGAAG CCTCCAAGAA GCTGAGTGAG TGTCTTCAGG AGGTGTACGA 
451 GCCC6AGTGG CCTGGCAGGG ATGAAGCAAA CAAGATTGCA GAGAACAATG 
501 ACCTACTCTG GATGGACTAC CACCAGAAGC TG6T6GACCA GGCTCTGCTG 
551 ACCATGGACA CCTACCTAGG CCAGHCCCT GATATCAAGT CGCGCATTGC 
601 CAAGCGGGGG CGGAAGCTGG TGGACTATGA CAGTGCCCGG CACCACTATG 
651 AGTCTCnCA AACCGCCAAA AAGAAGGATG AAGCCAAAAT TGCCAA6GCA 
701 GAAGAGGAGC TCATCAAAGC CCAGAAGGTG TTCGAGGAGA TGAACGTGGA 
751 TCTGCAGGAG GAGCTGCCAT CCCTGTGGAA CAGCCGTGTA GGHTCTATG 
801 TCAACACGH CCAGAGCATC GCGGGTCTGG AGGAAAACTT CCATAAAGAG 
851 ATGAGTAAGC TCAATCAGAA CCTCAATGAT GTCCTGGTCA GCCTAGAGAA 
901 GCAGCACGGG AGCAACACCT TCACAGTCAA GGCCCAACCC AGTGACAATG 
951 CCCCTGA6AA AGGGAACAAG AGCCC6TCAC CTCCTCCAGA TGGCTCCCCT 
1001 GCTGCTACCC CTGAGATCAG AGTGAACCAT GAGCCA6A6C CGGCCAGTG6 
1051 GGCCTCACCC GGGGCTACCA TCCCCAAGTC CCCATCTCAG CCAGCAGAGG 
1101 CCTCCGAGGT GGTGGGTGGA GCCCAGGAGC CAGGGGAGAC AGCAGCCAGT 
1151 GAAGCAACCT CCAGCTCTCT TCCGGCTGTG GTGGTGGAGA CCTTCTCCGC 
1201 AACT6TGAAT GQGGCG6TGG AGGGCAGC6C TGGGACTGGA CGCTTGGACC 
1251 TGCCCCCGGG AHCATGITC AAGGHCAAG CCCAGCATGA TTACACGGCC 
1301 ACTGACACTG ATGAGCTGCA ACTCAAAGCT GGCGATGTGG TGHGGTGAT 
1351 TCCTHCCAG AACCCAGAGG AGCAGGATGA AGGCTGGCTC ATGGGTGTGA 
1401 AGGAGAGCGA CTGGAATCAG CACAAGGAAC TGGAGAAATG CCGCG6C6TC 
1451 TTCCCGGAGA ATTTTACAGA GCGGCTACAG TGACGGAGGA GCCHCCGGA 
1501 GTGTGAAGAA CCTTTCCCCC AAAGATGTGT G 
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1 GAATTCGTCG ACCCACGCGT CCG6TTTGAG CA6TGCGTCC 
41 AGAATHCAA CAAGCAGCTG ACGGAGGGCA CCCGGCTGCA 
81 GAAGGATCTC CGGACCTACC TG6CCTCCGT CAAAGCCATG 
121 CACGAGGCTT CCAA6AAGCT 6AATGAGTGT CTGCAGGAGG 
161 TGTATGAGCC CGATTGGCCC GGCAGGGATG AGGCAAACAA 
201 GATCGCAGAG AACAACGACC TGCTGTGGAT GGATTACCAC 
241 CAGAAGCTGG TGGACCAGGC GCTGCTGACC ATGGACACGT 
281 ACCTGGGCCA GTTCCCCGAC ATCAA6TCAC GCATTGCCAA 
321 GCGGGG6CGC AAGCTGGTGG ACTACGACAG TGCCCGGCAC 
361 CACTACGAGT CCCHCAAAC TGCCAAAAAG AAGGATGAAG 
401 CCAAAATTGC CAAGGCCGAG GAGGAGCTCA TCAAAGCCCA 
441 GAAGGTGTn GAGGAGATGA ATGTGGATCT GCAGGAGGAG 
481 CTGCCGTCCC TGTGGAACAG CCGCGTAGGT TTCTACGTCA 
521 ACACGTTCCA GAGCATCGCG GGCCTGGAGG AAAACTTCCA 
561 CAAGGAGATG AGCAAGCTCA ACCAGAACCT CAATGATGTG 
601 CTGGTCG6CC TGGAGAAGCA ACACGG6A6C AACACCTCCA 
641 CGGTCAAGGC CCAGCCCAGT GACAACGCGC CT6CAAAAGG 
681 GAACAAGAGC CCHCGCCTC CAGATGGCTC CCCTGCCGCC 
721 ACCCCCGAGA TCAGAGTCAA CCACGAGCCA GAGCCGGCCG 
761 GCGGGGCCAC GCCCGGGGCC ACCCTCCCCA AGTCCCCATC 
801 TCAGCCAGCA GAG6CCTC66 AG6TGGCGGG TGGGACCCAA 
841 CCTGCGGCTG GAGCCCAGGA GCCAGGGGAG ACGGCGGCAA 
881 GTGAAGCAGC CTCCAGCTCT CTTCCTGCTG TCGTGGTGGA 
921 GACCnCCCA GCAACTGTGA ATGGCACCGT GGAGGGCGGC 
961 AGTGGG6CC6 GGCGCTTGGA CCTGCCCCCA GGHTCAIGT 
1001 TCAAGGTACA GGCCCA6CAC GACTACACGG CCACTGACAC 
1041 AGACGAGCTG CAGCTCAAGG CTGGTGATGT GGTGCTGGTG 
1081 ATCCCCTTCC AGAACCCTGA AGAGCAGGAT GAAGGCTGGC 
1121 TCATGGGCGT GAAGGAGAGC GACTGGAACC AGCACAAGGA 
1161 GCTGGAGAAG TGCCGTGGCG TCTTCCCCGA GAACHCACT 
1201 GAGAGGGTCC CATGACGGCG GGGCCCAGGC AGCCTCCGGG 
1241 CGTGTGAAGA ACACCTCCTC CCGAAAAATG TGIGGHCTT 
1281 ITTTTTGTTT TGnTTCGTT TTTCATCTTT TGAAGAGCAA 
1321 AGGGAAATCA AGAGGAGACC CCCAGGCAGA GGGGCGTTCT 
1361 CCCAAAGAH AGGTCGHH CCAAAGAGCC GCGTCCCGGC 
1401 AAGTCCGGCG GAATTCACCA GTGTCCTGAA GCTGCTGTGT 
1441 CCTCTAGTTG AGTTCTGGCG CCCCTGCCTG TGCCCGCATG 
1481 TGTGCCTGGC CGCAGGGC6G GGCTGGGGGC TGCCGAGCCA 
1521 CCATGCTTGC CTGAAGCTTC GGCCGCGCCA CCCQGGCAAG 
1561 GGTCCTCTTT TCCTGGCAGC TGCT6TGGGT GGGGCCCAGA 
1601 CACCAGCCTA ACCTGGCTCT GCCCCGCAGA CGGTCTGTGT 
1641 GCTGTTTGAA AATAAATCTT AGTGHCAAA ACAAAATGAA 
1681 ACAAAAAAAA TGATAAAAAA AAAAAAAAAA AAAAAAAAAA 
1721 AAAAGGGCQG CCGC 
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1 EFVOPRVRFE QCVQNFNKQL TE6TRLQKDL RTYLASVKAM 
41 HEASKKLNEC LQEVYEPDWP GRDEANKIAE NNDLLWMDYH 
81 QKLVDQALLT MDTYLGQFPD IKSRIAKRGR KLVDYDSARH 
121 HYESLQTAKK KDEAKIAKAE EELIKAQKVF EEMNVDLQEE 
161 LPSLWNSRVG FYVNTFQSIA GLEENFHKEM SKLNQNLNDV 
201 LVGLEKQHGS NTSTVKAQPS DNAPAKGNKS PSPPD6SPAA 
241 TPEIRVNHEP EPAGGATPGA TLPKSPSQPA EASEVAG6TQ 
281 PAAGAQEPGE TAASEAASSS LPAVVVETFP ATVNGTVEGG 
321 SGAGRLDLPP 6FMFKVQAQH DYTATDTDEL QLKAGDVVLV 
361 IPFQNPEEQD EGWLMGVKES DWNQHKELEK CRGVFPENFT 
401 ERVP 
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1 MWKSVVGHDV SVSVETQGDD WDTDPDFVND ISEKEQRWGA KTIEGSGRTE 
51 HINIHQLRNK VSEEHDILKK KELESGPKAS HGYGGQFGVE RDRMDKSAV6 
101 HEYVADVEKH SSQTDAARGF GGKYGVERDR ADKSAVGFDY KGEVEKHASQ 
151 KDYSHGFGGR YGVEKDKRDK AALGYDYKGE TEKHESQROY AKGFG6QYGI 
201 QKDRVDKSAV GFNEMEAPTT AYKKTTPIEA A5SGARGLKA KFESLAEEKR 
251 KREEEEKAQQ MARQQQERKA VVKMSREVQQ PSMPVEEPAA PAQLPKKISS 
301 EVWPPAESHL PPESQPVRSR REYPVPSLPT RQSPLGNHLE DNEEPPALPP 
351 RTPEGLQVVE EPVYEAAPEL EPEPEPDYEP EPETEPDYED VGELDRQDED 
401 AEGDYEOVLE PEDTPSLSYQ AGPSAGAGGA GISAIALYDY QGEGSDELSF 
451 DPDDIITDIE MVDEGWWRGQ CRGHFGLFPA NYVKLL 
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1 MAGNFDSEER SSWYWGRLSR QEAVALLQGQ RH6VFLVRDS STSP6DYVLS 

51 VSENSRVSHY IINSSGPRPP VPPSPAQPPP GVSPSRLRIG DQEFDSLPAL 

101 LEFYKIHYLD TTTLIEPVAR SRQGSGVILR QEEAEYVRAL FDFNGNDEED 

151 LPFKKGDILR IRDKPEEQWW NAEDSEGKRG MIPVPYVEKY RPASASVSAL 

201 IGGNQEGSHP QPLGGPEPGP YAQPSVNTPL PNLQNGPIYA RVIQKRVPNA 

251 YDKTALALEV GELVKVTKIN VSGQWEGECN 6KRGHFPFTH VRLLDQQNPD 

301 EDFS 
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1 CA6CC6CTGG AGGGGGCGCC TGGTGTAGAT GTGAAAAGCC GTAACCAGGA 
51 ACCAGTAAAG ATGT6GAAGT CTGTAGTGGG GCATGATGTA TCGGTTTCCG 
101 T66A6ACCCA GGGTGAT6AC TG6GATACAG ACCCTGACTT T6TGAATGAC 
151 ATCTCCGAGA AGGAGCAACG GTGGGGA6CC AAGACCAHG AGGGCTCTGG 
201 ACGCACAGAG CACATCAACA TCCACCAGCT GAGGAACAAA GTGTCAGAGG 
251 AGCACGACAT CCTCAAGAAG AAGGAGCTGG AATCGGGGCC TAAGGCATCC 
301 CATGGCTATG GCGGTCAGH TGGAGTGGAG AGAGACCGGA TGGACAAGAG 
351 TGCCGTGGGC CACGAGTATG TTGCTGATGT GGAGAAACAC TCATCTCAGA 
401 CTGATGCSGC CAGAGGCTTT GGGGGCAAAT ATGGAGTTGA GAGGGACCGG 
451 GCAGACAAGT CAGCG6TGGG CTHGACTAC AAAGGAGAAG TGGAAAAGCA 
501 TGCATCTCAG AAAGATTACT CTCATGGCTT TGGTGGCCGC TACGGGGTAG 
551 AGAAGGATAA ACGGGACAAA GCAGCCCTGG 6ATACGACTA CAAAGGAGAG 
601 ACGGAGAAGC ACGAGTCTCA GAGAGATTAT GCCAAGGGCT TTGGTGGCCA 
651 ATATGGAATC CAGAAAGACC GAGTGGATAA GAGIGCTGH GGCTTCAATG 
701 AAATGGAGGC CCCAACCACG GCGTATAAGA AGACAACACC CATAGAAGCT 
751 GCTTCCAGTG GTGCCCGTGG GCTGAAGGCA AAATHGAGT CCCTGGCTGA 
801 GGAGAAGAGG AAGCGAGAGG AAGAAGAGAA GGCACAGCAG ATGGCCAGGC 
851 AGCAACAGGA GCGAAAGGCT GTGGTAAAGA TGAGCCGAGA AGTCCAGCAG 
901 CCATCCATGC CTGTGGAAGA GCCAGCGGCA CCAGCCCAGT TGCCCAAGAA 
951 GATCTCCTCA GAGGTCTGGC CTCCAGCAGA GAGTCACCTA CCGCCAGAGT 
1001 CTCAGCCAGT GAGAAGCAGA AQGGAATACC CTGTGCCCTC TCTGCCCACG 
1051 AGGCAGTCTC CATTGCAGAA TCACHGGAG GACAACGAGG AGCCCCCAGC 
1101 TCTGCCCCCT AGGACCCCAG AAGGCCTCCA GGTGGTGGAA GAGCCAGTGT 
1151 ACGAAGCAGC ACCCGAGCTG GAGCCGGAGC CAGAGCCTGA CTATGAGCCA 
1201 GAGCCAGAGA CAGAGCCTGA CTATGAGGAT GTTGGGGAGT TAGATCGGCA 
1251 GGATGAGGAT GCAGAGGGAG ACTATGAGGA TGTGCTGGAG CCCGANGACA 
1301 CCCCTTCTCT GTCCTACCAA GCTGGACCCT CAGCTGGGGC TGGTGGTGCG 
1351 GGGATCTCTG CTATAGCCCT GTATGATTAC CAAGGAGAGG GAAGCGATGA 
1401 GCTTTCCnT GATCCAGATG ACATCATCAC TGACAHGAG ATGGTGGATG 
1451 AAGGCTGGTG GCG6GGCCAA TGCCGTGGGC ACTTTGGACT TTTCCCTGCA 
1501 AACTATGTCA AGCTCCTCTA ATGACCAGCC CAHGTCHC CGACHCCCG 
1551 AATTCGAAGC TGCTCTGCCT CCCTCTTCCC ACTCCATGGT ACTGCTGCAA 
1601 GGACCTGGCT GAACATCATG AGATGCCTGA AGHCTGGCA GTCTGTCTCC 
1651 CGCCTCTHA AGAGCTHAG GTAGAATCGC TCCAGGTGGG GGTGGGGGTG 
1701 GGGGTGGGAT CCCTCTGTCC CTCTGTGACC ACTCTTCCCT GAGGTAGCTC 
1751 ATGAAATCAT CTTGCAGACC TGCCTCCHC AGCCGCACCC CAGCTCTGCC 
1801 AACCTTGCTC TAGAGTGCTG GGAHCCCTT GCCCCGACCC TGGGTGCCAG 
1851 CCTAGAGGGG AGGCTCTCAC AGGGCTGCCT GATTCGCCCT GTTGTGCTTT 
1901 TGCTCATTTT TCHCCCTTA GCAGACAAAT TGGAACTGCC CTTCTGTTTA 
1950 GTCCTAAAAC TGAAAATAAA ATGAGACTGT GGCTAAAAAA AAAAAAAAAA 
2003 AAA 
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1 6GATCCCCGG AGCC6GTCCG CT6GGC6GGG C6CAGGGCTG GAG6GGCGCG 
51 CGTGCCGGCG 6CG6CCCA6C GTGAAAGCGC GGAGGCGGCC ATGGC6GGCA 
101 ACTTCGACTC GGAGGAGCGG AGTAGCTGGT ACTG6GGCCG CCTGAGCCGG 
151 CAGGA6GCGG TGGCGCTATT GCAGGGCCAG CGGCACGGGG TGTTCCTGGT 
201 GCGGGACTCG AGCACCAGCC CCGGGGACTA TCTGCHAGC GTCTCCGAAA 
251 ACTCGCGCGT CTCCCACTAC ATCATCAACA GCAGCGGCCC GCGCCCTCCA 
301 GTGCCTCCGT CGCCCGCTCA GCCTCCGCCG GGAGTGAGTC CCTCCAGGCT 
351 CCGAATAGGA GATCAAGAAT TTGAHCAn GCCTGCTTTA CTGGAATTCT 
401 ACAAAATACA CTATTTGGAC ACTACAACAT TGATAGAACC AGTGGCCAGA 
451 TCAAGGCA6G GTAGTGGAGT GATTCTCAGG CAGGAGGAGG CAGAGTATGT 
501 GCGGGCCCTG nTGACTTTA ATGGGAATGA TGAAGAA6AT CnCCCTTTA 
551 AGAAAGGAGA CATCCTGAGA ATCCGGGATA AGCCTGAAGA CGAGTGGTGG 
601 AATGCAGAGG ACAGCGAAGG AAAGAGGGGG ATGAHCCTG TCCCHACGT 
651 GGAGAAGTAT AGACCTGCCT CCGCCTCAGT ATCGGCTCT6 ATTGGAGGTA 
701 ACCAGGAGGG HCCCACCCA CAGCCACTGG 6TGGGCC6GA GCCTGGGCCC 
751 TATGCCCAAC CCAGCGTCAA CACTCCGCTC CCTAACCTCC AGAATGGGCC 
801 CAHTATGCC AGGGHATCC AGAA6CGAGT CCCTAATGCC TACGACAAGA 
851 CAGCCnGGC TrTGGAGGTC GGTGAGCTGG TAAAGGTTAC GAAGATTAAT 
901 GTGAGT6GTC AGTGGGAAGG GGAGTGTAAT GGCAAACGAG GTCACTTCCC 
951 AHCACACAT GTCCGiaGC TGGATCAACA GAATCCCGAT GAGGACHCA 
1001 GCTGAGTATA 6CTCGACAGT HGCTGACAG ATGGAACAAT CTGTTTTCCC 
1051 CCAAHGCCA TCTATACAAT ITTCnACAG GTGTCAAAGC AGTGTAGni 
1101 ATATAAGCAT TCTGnACCT GGGATCHTT HAAGACTGA ACTACTCCAT 
1151 TCTCACHGT ATTTACCATA HCAGGGTAC GAAACCGGAG GGCHATGTG 
1201 GTTAACnCT GAGHGGCAG TTTTAGGTGG TAGTGGCCGT GCCTGTATGA 
1251 GAAGAC6TAA ATACATTGCC TCCTTTCTn TGGGCAAAAC AGATCA 

FIG. 40 



1 MSSECOVGSS KAVVNGLASG NHGPDKDMDP TKICTGKGTV TLRASSSYRG 
51 TPSSSPVSPQ ESPKHESKSO EWKLSSSADT NGNAQPSPLA AKGYRSVHPS 
101 LSAOKPQGSP LLNEVSSSHI ETDSQDFPPT SRPSSAYPST TIVNPTIVLL 
151 QHNREQQKRL SSLSDPASER RAGEQDPVPT PAELTSPGRA SERRAKDASR 
201 RVVRSAQDLS DVSTOEVGIP LRNTERSKDW YKTMFKQIHK LNRDDDSDVH 
251 SPRYSFSDDT KSPLSVPRSK SEMNYIEGEK VVKRSATLPL PARSSSLKSS 
301 PERNDWEPLD KKVDTRKYRA EPKSIYEYQP 6KSSVLTNEK MSRDISPEEI 
351 DLKNEPWYKF FSELEFGRPS SAVSPTPOIT SEPPGYIYSS NFHAVKRESD 
401 GTP6GLASLE NERQIYKSVL EGGDIPLQGL SGLKRPSSSA STKDSESPRH 
451 FIPADYLEST EEFIRRRHDO KEKLLADQRR LKREQEEAOI AARRHTGVIP 
501 THHQFITNER FGOLLNIDDT AKRKS6LEMR PARAKFDFKA QTLKELPLQK 
551 GOVVYIYRQI DQNWYEGEHH 6RVGIFPRTY lELLPPAEKA QPRKLAPVQV 
601 LEYGEAIAKF NFNGDTQVEM SFRKGERITL LRQVDENWYE GRIPGTSRQG 
651 IFPITYVDVL KRPLVKTPVD YIDLPYSSSP SRSATVSPQA SHHSLSAGPD 
701 LTESEKNYVQ PQAQQRRVTP DRSQPSLDLC SYQALYSYVP QNDDELELRD 
751 6DIVDVMEKC DDGWFVGTSR RTRQFGTFPG NYVKPLYL 
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1 CCTCACC6NN CCTGGTGTAG GTACCGGATC GAATTCAAGC GAAAAACAGA 
51 GCGGGGCTGA CTGTAGCGTG GAGCGCGAGC CGGGCTG6AC GCGCGCAAGC 
101 CCHGCCGGG GACCCGCGAG GCAAGCAGTC TCCCTGTGGA GCGTCGTCCT 
151 CCATCCCTGT AAGCACCGTT ACAGAGAATG AAACAAGG6C AGAAGHACA 
201 GAGCCCGTGA GGCATCHCA AATAGAAGAC TGGAGACTAG AAASAGAATA 
251 HGCCAGGAG nGGCATCCA HGGAAGACC TTGAGATCCT CTCAGCTCAC 
301 AACTCCAGGA CCGATGCATC HCCCACCAC CHGAAGCAC TGAGCCCTCC 
351 AGAGCTGCAT CTGGGAAGAC TCGCCTGCCT CCAGCATGAG TTCTGAATGT 
401 GATGTTGGAA GCTCTAAAGC TGTGGTGAAT GGCHGGCAT CTGGCAACCA 
451 TGGACCAGAC AAAGACATGG ACCCTACCAA AATCTGCACT GGGAAAGGAA 
501 CAGTGACTCT TCGGGCCTCG TCTTCCTACA GGGGAACCCC AAGCAGCAGC 
551 CCTGTGA6CC CCCAGGAATC TCCGAAGCAT GAAAGCAA6T CAGATGAATG 
601 GAAACHTCT TCCAGTGCAG ATACCAATGG CAACGCCCAG CCCTCCCCAC 
651 HGCTGCCAA GGGCTATAGA AGTGTGCATC CCAGCCTTTC TGCTGACAAG 
701 CCCCAGGGCA GTCCHTACT AAACGAAGH TCTTCHCCC ACAHGAAAC 
751 CGATTCCCAA GACTTCCCTC CAACAAGCAG ACCTTC6TCT 6CCTACCCCT 
801 CCACCACCAT CGTCAACCCT ACCATTGT6C TCCTGCAGCA CAATCGAGA6 
851 CAGCAAAAGC GACTCAGTAG TCTTTCAGAT CCTGCCTCAG AGAGAAGAGC 
901 GGGTGAGCAG GACCCAGTAC CAACCCCAGC AGAACTCACT TCGCCCG6CA 
951 GGGCnCTGA GAGAAGGGCA AAGGATGCTA GCAGACGGGT GGTGAGGAGC 
1001 GCACAGGACC TGAGCGATGT GTCTACAGAT GAAGTGGGCA TTCCACTCCG 
1051 GAATACCGAG CGATCGAAAG ACTGGTACAA AACTATGTTT AAACAGATCC 
1101 ACAAACT6AA CAGAGATGAT GATTCTGATG TCCATTCCCC TCGATACTCC 
1151 nCTCTGATG ACACAAAGTC TCCCCTTTCT GTGCCTCGCT CAAAAAGTGA 
1201 GATGAACTAC ATCGAAGGGG AGAAAGTGGT TAAGAGGTCC GCCACACTCC 
1251 CCCTCCCAGC CCGCTCTTCC TCACTCAAGT CCAGCCCGGA AAGAAACGAC 
1301 T6G6AGCCCC TAGATAAGAA AGTGGATACG AGAAAATACC GAGCAGAGCC 
1351 CAAAAGCAH TACGAATATC AGCCGGGCAA GTCTTCGGTC CTGACCAATG 
1401 AGAAGATGAG TCGGGATATA AGCCCAGAAG AGATAGATTT AAAGAATGAA 
1451 CCTTGGTATA AATTCTTHC GGAATTGGAG HTGGGAGAC CGAGCTCAGC 
1501 AGTCAGCCCG ACTCCAGACA HACGTCAGA GCCTCCTGGA TATATCTATT 
1551 CTTCCAACn CCATGCAGTG AAGAGAGAAT CGGACGGGAC CCCCGGGGGT 
1601 CTCGCTAGCT TGGAGAATGA GAGGCAGATC TATAAGAGTG TCTTGGAAGC 
1651 T6GCGACATC CCTCTTCAGG GCCTCAGTG6 GCTCAAGCGA CCTTCCA6CT 
1701 CAGCTTCCAC TAAAGATTCA GAGTCACCAA GACATHTAT ACCAGCTGAT 
1751 TACTTGGAGT CCACAGAAGA ATHAnCGG AGACG6CACG ATGATAAAGA 
1801 GAAACTTTTA GCGGACCAGA GACGACHAA GCGCGA6CAA GAAGAGGCCG 
1851 ATATTGCAGC TCGCCGCCAC ACAGGTGTCA TCCCGACTCA TCATCAGHT 
1901 ATCACTAATG AGCGCTTTGG GGACCTCCTC AATATAGATG ATACGGCCAA 
1951 AAGGAAATCT GGGTTAGAGA TGAGACCTGC TCGAGCCAAA TTTGACTHA 
2001 AAGCCCAGAC CCTGAAGGAG CTGCCTCTGC AGAAGGGAGA CGnGTITAC 
2051 ATCTACAGAC AGATTGACCA GAACTGGTAT GAAGGTGAAC ACCATGGCCG 
2101 GGTGGGAATC HCCCACGCA CCTATATC6A GCTTCTTCCT CCAGCTGA6A 
2151 AGGCTCAGCC CAGAAAGTT6 GCACCCGTAC AAGTTnGGA ATATGGAGAA 
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2201 GCCAHGCM AGTTTAACn TMTGGAGAT ACACAAGTAG AAATGICTH 
2251 CCGAAAGGGG GAGAGGATCA CGCTGCTCCG ACAGGTGGAT GAGAACTGGT 
2301 ATGAAGGGAG GATTCCTGGG ACATCTCGCC AAGGCATTTT CCCTATCACC 
2351 TATGTAGATG TGCHAAGAG GCCATTGGTG AAAACCCCT6 TGGAHACAT 
2401 CGACCTGCCT TAHCncn CCCCAAGTCG CAGTGCCACT GTGAGCCCAC 
2451 AGGCTTCTCA TCAnCAHG AGCGCAQGAC CTGATCTCAC AGAATCTGAA 
2501 AAGAACTATG TGCAACCTCA AGCCCAGCAG CGAAGAGTCA CCCCAGACAG 
2551 GAGTCAGCCC TCACTGGAn TGTGTAGCTA CCAAGCGHA TATAGTTATG 
2601 TGCCACAGAA CGATGATGAG TTGGAACTCC 6AGATGGAGA TATTGHGAT 
2651 GTCATGGAAA AATGTGACGA IGGAIGGHT GnGGCACTT CGAGAAGGAC 
2701 GAGGCAGTTT GGTACnTTC CAGGCAACTA TGTAAAACCT TTATATCTAT 
2751 AAGAAGACTA AAAAGCACAG AGATTAHn TTATCGGAGG ATGAAGCATC 
2801 ATTCATGAAC TGGTCTCTTT AHTAAGTAC TGAGTCAGTA AGAAAACTAA 
2851 TGCAGTTGGT AAAGAAAGAA TTCAAAGAAG GAACAGAGAA CTGIGTHGA 
2901 AACCCAHGT GTATCAGGGA HAACTATCT GCTGAAGACA TCTGTATTTA 
2951 CATGACTGCT TCTGGGAGCT GCTCTAGCCC CCGCTGCHG GGGAATCTGA 
3001 TCTGGAGCAT GTCCATGAGC AACATTAGCC AAAAAAAAAA GCTTGGGCCC 
3051 TATTCTATAG TGTCACCTAA ATACTAGCTT GATCCGGCTG CTAACAAAGC 
3101 CCGAAAGGAA GCTGAGTTGC TGCTGCCACC GCT6AGCAAT AACTA6CATA 
3151 ACCCCTTGGG GCCTCTAAAC GGGTCHGAG GGGTiniTG GCTGAAAGGA 
3201 GGAACTATAT CCGGATAACC TGGCGTAATA GCGAAGAGGC CCGCACCGAT 
3251 CGCCCTTCCC AACAGHGGG CAGCCTGAAT GGCGAATGGA CGCGCCCTGT 
3301 AGCGGCGCAT TAAGCGCGGC GGGTGTGGTG GTTACGCGCA GGGTG 
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1 HNNCACTCA CCGTCCTGGT GATGGTACCG GATCGAATTC AAGCGTGGCC 
51 GTGGCCGTGG GGCGCGCGGG GACCGCCCGG GGTGCCCGCT CCGCTCAGCG 
101 TCCGGGCCGC GTGGTCCGGC GGAGCCCCGA GACCACCCCC GGGCGGGGCG 
151 CCGCCGCGAT GTCGGTGGCT GGGCTCAAGA AGCAGHCCA CAAAGCCAGC 
201 CAGCTGTTTA GTGAAAAAAT AAGTGGTGCC GAAGGAACGA AGCTAGATGA 
251 AGAATTTCTG AACATGGAAA AGAAAATAGA TATCACCAGT AAAGCTGHG 
301 CAGAAATCCT HCAAAAGCC ACAGAGTATC TCCAACCCAA TCCAGCATAC 
351 AGAGCTAAGC TAGGAATGCT GAACACTGTG TCGAAGCTCC GAGGGCAGGT 
401 GAA6GCCACC G6CTACCCAC AGACGGAAGG CHGCTGGGG GACTGCATGC 
451 TGAAGTATGG CAAGGAGCTC GGAGAAGACT CTGCnTTGG CAACTCGHG 
501 GTAGATGHG GTGAGGCCCT GAAACTCATG GCTGAGGTGA AAGACTCTCT 
551 GGATATTAAT GTGAAGCAAA CnHAHGA CCCACTGCAG CTACTGCAAG 
601 ACAAAGATTT AAAGGAGATC GGGCACCACC T6AGAAAGCT GGAAGGCCGT 
651 CGCCTGGAn ATGATTATAA AAAGCGGCGG GTAGGTAAGA TCCCCGAGGA 
701 AGAAATCAGA CAAGCAGTAG AGAAGTTTGA AGAGTCAAAG GAGHGCCCG 
751 AAAGGAGCAT GITTAATTTr TTAGAAAATG ATGTAGAGCA AGTGAGCCAG 
801 CTGGCTGTGT HGTAGAGGC GGCATTAGAC TATCACAGGC AGTCCACAGA 
851 GATCCTCCAG GAGCTGCAGA GCAAGCTGGA GHGCGAATA TCTCHGCAT 
901 CCAAAGTCCC CAAGCGAGAA HCATGCCAA AGCCTGTGAA CATGAGHCC 
951 ACCGATGCCA ATGGGGTCGG ACCCAGCTCT TCATCAAAGA CACCAGGTAC 
1001 TGACACTCCC GCGGACCAGC CCTGCTGTCG TGGTCTCTAT GACTHGAGC 
1051 CAGAAAATGA AGGAGAATTA GGATHAAAG AAGGGGACAT CATTACAnA 
1101 ACCAATCAGA TAGATGAAAA CTGGTATGAA GGGATGCTTC GTGGGGAATC 
1151 CGGCnCTTC CCCAHAAn ACGTGGAAGT CAHGIGCCT TTACCTCCGT 
1201 AAATGTGTCT HIGGACCTA ACHCAGAAC TGAAATGAAT TGGCACCA6T 
1251 GCTCTCTCAG TGTGGTGnC TGTGACANCC TCGCTCTCTG GCCCACHAA 
1301 TCACTTTTGT ATGIGIGHT TCTTTATAAT GTATITTGTA TCAATHAAT 
1351 HGTATAACT GAnTCTTTG TCCTAACTCA TAAAAATAGT TTTCnCHG 
1401 TTCTAAAAAG TCAnGGHA AATGTATTTG CTTCCTGTTG CTAAAACGAG 
1451 TAAAHGCGC CCATTCGAAT GGCCTGGGTA GTCCHGACT GCAGTGGGAA 
1501 CGCACCCTTT GCAGCCATGA AAGCTAAAGG TTTGTTTCCT GACATTAHG 
1551 ATGGCCTCTG GTCTTHCCT GITTTAAGCT TACCTGTGAA CAGCCCAATA 
1601 AACNTGACAC ACTGTANAAT AANAAGGGTG GCCCNA 
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1 MSVAGLKKQF HKASQLFSEK ISGAEGTKLD EEFLNMEKKI 

51 LSKATEYLQP NPAYRAKLGM LNTVSKLRGQ VKATGYPQTE 

101 GKELGEDSAF GNSLVDVGEA LKLMAEVKDS LDINVKQTFI 

151 LKEIGHHLRK LEGRRLDYDY KKRRVGKIPE EEIRQAVEKF 

201 MFNFLENDVE QVSQLAVFVE AALDYHRQST EILQELQSKL 

251 PKREFMPKPV NMSSTDANGV GPSSSSKTPG TDTPADQPCC 

301 EGELGFKEGD IITLTNQIDE NWYEGMLRGE SGFFPINYVE 
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1 MSGSYDEASE EITDSFWEVG NYKRTVKRID DGHRLCNDLM SCVQERAKIE 
51 KAYAQQLTDW AKRWRQLIEK GPQYGSLERA WGAMMTEADK VSELHQEVKN 
101 SLLNEDLEKV KNWQKDAYHK QIMGGFKETK EAEDGFRKAQ KPWAKKMKEL 
151 EAAKKAYHLA CKEERLAMTR EMNSKTEQSV TPEQQKKLVD KVDKCRQDVQ 
201 KTQEKYEKVL EDVGKnPQY MEGMEQVFEQ CQQFEEKRLV FLKEVLLDIK 
251 RHLNLAENSS YMHVYRELEQ AIRGADAQED LRWFRSTSGP GMPMNWPQFE 
301 EWNPDLPHn AKKEKQPKKA EGATLSNATG AVESTSQAGD RGSVSSYDRG 
351 QTYATEWSDD ESGNPFGGNE ANGGANPFED DAKGVRVRAL YDYDGQEQDE 
401 LSFKAGDELT KLGEEDEQGW CRGRLDSGa GLYPANYVEA I 
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1 CGGGCHGAG GCTGGGCCGC CGCCGCCGCC CGCTTTGCCA CCCGCCCCGC 
51 TGATGGTGTC CGGTGCTCCG GCGCCCAGGG ACACAGACCG GGAGCAGGAC 
101 CACTTCTCTC ACCTCCGGAT CTCTCCTGCT TCCGCA6CCT GTGAGCA6CA 
151 GGCCTGCTAA CTGCAGATCC ACAACCGCAC AGCTCGCTAC AGGTGCACCA 
201 TGTCTGGCTC CTACGATGAG GCCTCAGAGG AGATCACAGA TAGCnCTGG 
251 GAGGTGGGGA ACTACAAGCG GACGGTGAAG CGCATCGACG ATGGGCACCG 
301 CCTGTGCAAC GACCTCATGA GCTGCGTGCA 6GAGCGCGCC AAGATCGAGA 
351 AGGCATACGC GCAGCAGCTC ACCGACTGGG CCAAGCGCTG GCGCCAGCTC 
401 ATCGAGAAAG GTCCTCAGTA TGGCAGCCTG GAGCGGGCGT GGGGCGCCAT 
451 GATGACAGAA GCAGATAAGG TCAGCGAGCT GCACCAGGAG GTGAAGAACA 
501 GCCTGCTGAA TGAGGACCTG GAGAAAGTCA AGAACTGGCA 6AAGGATGCC 
551 TATCACAAGC AGATCATGGG TGGCTTCAAG GAGACGAAAG AG6CCGAGGA 
601 TGGCTTCCGA AAGGCCCAGA A6CCCTGGGC TAAAAAGATG AAGGAGCTAG 
651 AGGCGGCCAA GAAGGCCTAT CACnGGCTT GTAAGGAGGA AAGGCTGGCC 
701 ATGACCCGGG AGATGAACAG TAAGACAGAG CAGTCGGTCA CCCCTGAACA 
751 GCAGAAGAAA CTTGTGGACA AAGTGGACAA ATGCAGACAG GATGTGCAAA 
801 AGACTCAGGA GAAGTATGAG AAGGTCCTGG AAGATGTGGG CAAGACCACA 
851 CCACAGTACA TGGA6GGCAT GGAGCAGGTG ITTGAGCAGT GCCAGCAGH 
901 TGAGGAGAAG CGGCTGGTCT TCCTGAAGGA AGTCCTGCTG GATATCAAAC 
951 GGCATCTCAA CCTAGCGGAG AACAGCAGCT ACATGCATGT CTACCGAGAA 
1001 CTGGAGCAGG CCATCCGGGG GGCCGATGCC CA6GAGGACC TCAGGTGGH 
1051 CCGCAGCACC AGTGGCCCCG GGAT6CCCAT GAACTGGCCG CAGHCGAGG 
1101 AGTGGAACCC AGACCTCCCG CACACCACTG CCAAGAAQGA GAAACAGCCT 
1151 AAGAAGGCAG AGGGGGCCAC CCTGAGCAAT GCCACTGGGG CTGTAGAATC 
1201 CACATCCCAG GCTGGGGACC GTGGCAGTGT TAGCAGCTAT GACCGAGGCC 
1251 AAACATATGC CACCGAGTGG TCAGACGATG AGAGCGGAAA CCCCHCGGG 
1301 GGCAATGAGG CCAATGGTGG CGCCAACCCC HCGAGGATG ATGCCAAGGG 
1351 AGHCGTGTA CGGGCACTCT ATGACTACGA CGGTCAGGAG CA6GATGA6C 
1401 TCAGCnCAA G6CCGGAGAT GAGCTCACCA AGCTCGGAGA GGAAGACGAA 
1451 CAGGGHGGT GCCGCGGGCG GCTGGACAGC GGACAGCTGG GCCTCTATCC 
1501 TGCCAACTAC GTTGACGCTA TATAGCTACC HGCCCACCC GACTCCTCTC 
1551 AGTCCHGTC CACCGCCTTC CACCCTTCCC CTCCCCCTTG CCATAGAGH 
1601 CCAGACATAT mCCCATCA AGClTTTAn TTTTTAAAAG TCAAAACAGA 
1651 ACAAAAAAAA AAAAAAAAAA GAAGAAATAC GAAGAGACAG CGTTTGCAGC 
1701 CTACCTGGAG GCCG6GGGGG AGGGGGCHA GGGTGATGGC CTCCCCCACA 
1751 GCGTG6GCAA GGATCTTGGG ACTAACCCAA TGTCACATCT GGTCTATAGA 
1801 GTCCACCAAA GAGTCTCCTG AGTCTTGAGG GAGATCTTCT 6GATCCTTCT 
1851 ACCCTGTCTC GCTCTCCTAT CCCACCACAG CTGCCAGCA6 CTGCCCATGT 
1901 CACCTGAGCC TGGCTTCCTA AACTCTCCTG TCCCCTCTCC TGTCCCCCTT 
1951 CAACGCCCCC nCTCHAAA GGGCCCCCAA TCTTTAGTCT TCCACTCTGC 
2001 CCTGQGGGTG CTTTTCTCn CCCAGCCCTG TCCAGTGAGG CTGGGGGAGA 
2051 AGGCTGCGGA GGGGAGGGGA GTGTCTCnC ACTCCCCCAG ACATGAAGGC 
2101 AGGTGAGTGG GAGGGAGTCA TGGCCTCCCT GGCATACAGG AGAGGAAGAA 
2151 GGAGAACAGA CCATCTGACC AGGCTGTGCA ACACTCCCAA TGCCAAGCCC 
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2201 ATTT6AGGGA TGAAAACCCT AGCTGGGCCT GTG6GCAGAG GGCTCCTCCT 
2251 CAGAGCCAAT GAGCATTTGC AGAGACCCTA CCTGTCTCn TAGTCCnGG 
2301 CAATGGGCAA AGCCTCTTCC TTGGAAAGTC CAGGGCAAAG CCAGCAACAG 
2351 TAGCAACCTC CTCTCACTCT G6GGAG6AGG CAnGGCCAC CCATCCCCCT 
2401 CCCnCATGG TCATTCAGAA ACGCCACAGC CCCTCCCATC CCCAATCACT 
2451 GTGTCAGCAT CAGCCTTTGT GAAGACGGTC TACAAGGCTC TCACCTGGCC 
2501 AACCTAGGAG ATTCAGGGGC TCAGGAACCT AGGAGAHCA GGGGCHGGG 
2551 GAACCTCCAC CTTGGCACTG TAAGGGGAAG CCAGCAGCTC AGGCTGGTGT 
2601 GAGGAAGGAA CTCTGGATGG TCACTGTAGC HTCTTCCTT GACCHTTAG 
2651 TCCCCAACAT CCCCTCTGAA TGCTGGCAGC ACCCCCACCC CCACACACAC 
2701 ACTCCCATTT CTCTAAGCCC GAGAGTCHG AGTCnCAH AAAGGAHCT 
2751 GGGTGTGGGA GGGGACACAG GGCCTTGTGG HGGGAAGCA GGTGGCAGGC 
2801 TCTCCCnGG GAGGATGGGG TGGGAAACGA AACAGGTCAA CCAAGACCTC 
2851 TTACAGTGGA AAGTGGTCAG AGGCTGTTTC HIGGACCTT TGGGAACACA 
2901 GATHGAGAA AGTCTCATAT TCACAGCTGG TGTCCGCTAG GCCTCTGGCC 
2951 TACGGACACC CTCTGCCTTG TGAATCAGGT GACCTnTGG GCCTCCAGGG 
3001 AAAGAACAGG ACCACCATCC ATGTTCTCCG CGTCCCTTTA GCTCTCTGCT 
3051 GCnCTCCTG ACACTCAGGT CATGGACCCA AGCHIGGGG TCCTGACCAC 
3101 CGCCCCCCCC CACCCCCCTT CTCTTGACTA GGCTGCAGCA GGGCCHCTG 
3151 TTGGGTCAGT CCTCCTCAGG GCCAGGAGCA GGAACHAGC ACTCAAGAGA 
3201 CAGGGCTGTA AGCACCCACT TCCCTGTCAC TGriTGCCCT TGGGGCHCA 
3251 GCTGCAGCCC AGGHGGGCC CTGGAGCCCT CAGAACCGGA AGCAGGAHC 
3301 AAACCTCCCC HCTCCACAG CCCCCCCTGC CTCCCCAGAT GGTAGACATC 
3351 CCCCAGCTCT TACCHCACC CTCATCTCAG AAAGGCAAGA AGCCGCCATG 
3401 TCCGCACCn GGGGCCTGGG CTTCCCCCTC TCTGTGCCAG CGGHCCCAG 
3451 CACCTGGGGA GGGGCTGTGG CCTGACCAGA CCCCAGGCCC ACCCCACATA 
3501 GTATACTAGC TGCCCACTCT GGGGCAGGAA CTGGAAAATC CATCCCTTTT 
3551 GAACAACCAC CTTCAATGAC CCCCCCCATC TGGGACCAGA CTTGGTCCTC 
3601 AAGnATTCA 6CACCCCCAG TGCAGGAGQ6 TCCTCCCCCC ACCCCCCGAA 
3651 GTCCCTGGAG CCCGGAGCAG AGCCCCACCT GIGAHCCTG GTGHAGGGC 
3701 ACCTCAAACC TTGGGCT6GA CCACACCCCT TCCCGCCAH TCCAGACCCC 
3751 TACCTGTACT CCCCAGTGCT CCCCA6GGGC CTCHGATGC TGCACGGGAC 
3801 CCTGCAG6GC TCGGTCAGTG ATGTGTnTG TCCCCAGHA ACCGCCATCC 
3851 AGCGACCTGG TTCCAGGAGG AGCTCAGGTC ACCCCCACCA CCGCCGCCAC 
3901 TGCGTCTGCC GCCCTAGGCT HCAGACATC ATTAGHCCG ACACHGTGA 
3951 AACTCCGAGA CGTGCCGTGG TCTCAGCAAT GCACCTGTTT TATACATGAT 
4001 TGIGTAATTT AAAGGTATAT AAATACAAAT ATATATATTA TATCTATATC 
4051 TATCAGTTGT GACCGTATGG CTGTCGATAA AACCAGAAH C 
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1 GMTTCGTCG ACCCACGGTC CG6GAAGCCT HCACMGCA GATGATGGGC 

51 GGCnCAAGG AGACCAAGGA AGCTGAGGAC GGCHTCGGA AGGCACAGAA 

101 GCCCTGGGCC AAGAAGCTGA AAGAGGTAGA AGCAGCAAAG AAA6CCCACC 

151 ATGCAGCGTG CAAAGAGGAG AAGCTGGCTA TCTCACGAGA AGCCAACAGC 

201 AAGGCAGACC CATCCCTCAA CCCTGAACAG CTCAAGAAAT TGCAAGACAA 

251 AATAGAAAAG TGCAAGCAAG ATGTTCTTAA GACCAAAGAG AAGTATGAGA 

301 AGTCCCTGAA GGAACTCGAC CAGGGCACAC CCCAGTACAT GGAGAACATG 

351 GAGCAGGTGT TTGAGCAGTG CCAGCAGHC GAGGAGAAAC GCCnCGCH 

401 CTTCCGGGAG GHCTGCIGG AGGTTCAGAA GCACCTA6AC CTGTCCAATG 

451 TGGCTGGTTA CAAAGCCATT TACCATGACC TGGAGCAGAG CATCAGAGCA 

501 GCTGATGCAG TGGAGGACCT GAGGTGGHC CGAGCCAATC ACGGGCCGGG 

551 CATGGCCATG AACTGGCCGC AGTTTGAGGA GTGGTCCGCA GACCTGAATC 

601 GAA.CCCTCAG CCGGAGAGAG AAGAAGAAGT CCACTGACGG CGTCACCCTG 

651 ACGGGCATCA ACCAGACAGG CGACCAGTCT CTGCCGAGTA AGCCCAGCAG 

701 CACCCHAAT GTCCCGAGCA ACCCCGCCCA GTCTGCGCAG TCACAGTCCA 

751 GCTACAACCC CTTCGAGGAT GAGGACGACA CGGGCAGCAC CGTCAGTGAG 

801 AAGGACGACA CTAAGGCCAA AAATGTGAGC AGCTACGAGA AGACCCAGAG 

851 CTATCCCACC GACTGGTCA6 ACGATGA6TC TAACAACCCC TTCTCCTCCA 

901 CGGATGCCAA TGGGGACTCG AATCCAHCG ACGACGACGC CACCTCGGGG 

951 ACGGAAGTGC GAGTCCGGGC CCTGTATGAC TATGAGGGGC AGGAGCATGA 

1001 TGAGCTGAGC TTCAAGGCTG GGGATGAGCT GACCAAGATG GAGGACGAGG 

1051 ATGA6CAGGG CTGGTGCAAG GGACGCHGG ACAACGGGCA AGTTGGCCTA 

1101 TACCCGGCAA ATTATGTGGA GGCGATCCA6 TGA 

FIG. 48 



1 RIRRPTVREA FHKQMMGGFK ETKEAEDGFR KAQKPWAKKL KEVEAAKKAH 
51 HAACKEEKLA ISREANSKAD PSLNPEQLKK LQDKIEKCKQ DVLKTKEKYE 
101 KSLKELDQGT PQYMENMEQV FEQCQQFEEK RLRFFREVLL EVQKHLDLSN 
151 VAGYKAIYHD LEQSIRAADA VEDLRWFRAN HGPGMAMNWP QFEEWSADLN 
201 RTLSRREKKK STDGVTLTGI NQTGDQSLPS KPSSTLNVPS NPAQSAQSQS 
251 SYNPFEOEDD TGSTVSEKDD TKAKNVSSYE KTQSYPTDWS DDESNNPFSS 
301 TDANGDSNPF DDDATSGTEV RVRALYDYEG QEHDELSFKA GDELTKMEDE 
351 DEQGWCKGRL DNGQVGLYPA NYVEAIQ 
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1 AAAGGGAGG AGAGTGTCAA AAAGAAGGAT 

30 GGCGAGGAAA AAGGCAAACA GGAAGCACAA GACAAGCTGG 
70 GTCGGCTTTT CCATCAACAC CAAGAACCAG CTAAGCCAGC 
110 TGTCCAGGCA CCCTGGTCCA CTGCAGAAAA AGGGTCCACT 
150 TACCATTTCT GCACA6GAAA ATGTAAAAGT GGTGTATTAC 
190 CGGGCACTGT ACCCCTTTGA ATCCAGAAGC CATGATGAAA 
230 TCACTATCCA GCCAGGAGAC ATAGTCATGG TGGATGAAAG 
270 CCAAACTGGA GAACCCGGCT GGCHGGAGG AGAATTAAAA 
310 GGAAAGACAG GGIGGHCCC TGCAAACTAT GCAGAGAAAA 
350 TCCCAGAAAA TGAGGTTCCC GCTCCAGTGA AACCAGTGAC 
390 TGAHCAACA TCTGCCCCTG CCCCCAAACT GGCCHGCGT 
430 GAGACCCCCG CCCCTnGGC AGTAACCTCT TCAGAGCCCT 
470 CCACGACCCC TAATAACTGG GCCGACTTCA 6CTCCACGTG 
510 GCCCACCAGC ACGAATGAGA AACCAGAAAC GGATAACTGG 
550 GATGCATGGG CAGCCCAGCC CTCTCTCACC GTTCCAAGTG 
590 CCGGCCAGH AAGGCAGAGG TCCGCCTTTA CTCCAGCCAC 
630 GGCCACTGGC TCCTCCCCGT CTCCTGTGCT AGGCCAGGGT 
670 GAAAAGGTGG AGGGGCTACA AGCTCAAGCC CTATATCCH 
710 GGAGAGCCAA AAAAGACAAC CACTTAAAH HAACAAAAA 
750 TGATGTCATC ACCGTCCTGG AACAGCAAGA CATGTGGTGG 
790 TTTGGAGAAG TTCAAGGTCA GAAGQGnGG HCCCCAAGT 
830 CHACGTGAA ACTCAHTCA GGGCCCATAA GGAAGTCTAC 
870 AAGCATGGAT TCTGGTTCTT CAGAGAGTCC TGCTAGTCTA 
910 AAGCGAGTAG CCTCTCCAGC AGCCAAGCCG GTCGTHCGG 
950 GAGAAGAAAT TGCCCAGGTT AHGCCTCAT ACACCGCCAC 
990 CGGCCCCGAG CAGCTCACTC TCGCCCCTGG TCAGCTGAH 
1030 HGATCCGAA AAAAGAACCC AGGTGGATGG TGGGAAGGAG 
1070 AGCTGCAAGC ACGTGGGAAA AAGCGCCAGA TAGGCTGGH 
1110 CCCAGCTAAT TATGTAAAGC HCTAAGCCC TGGGACGAGC 
1150 AAAATCACTC CAACAGAGCC ACCTAAGTCA ACAGCAHAG 
1190 CGGCAGTGT6 CCA6GTGATT GGGATGTACG ACTACACCGC 
1230 GCAGAATGAC GATGAGCTGG CCHCAACAA GGGCCAGATC 
1270 ATCAACGTCC TCAACAAGGA GGACCCTGAC TGGTGGAAAG 
1310 GAGAAGTCAA TGGACAAGTG GGGCTCHCC CATCCAATTA 
1370 TGTGAAGCTG ACCACAGACA TGGACCCAAG CCAGCAATGA 
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1 KGRRVSKRRM ARKKANRKHK TSWVGFSINT KNQLSQLSRH 
41 PGPLQKKGPL TISAQENVKV VYYRALYPFE SRSHDEITIQ 
81 PGDIVMVDES QTGEPGWL6G ELKGKTGWFP ANYAEKIPEN 
121 EVPAPVKPVT DSTSAPAPKL ALRETPAPLA VTSSEPSHP 
161 NNWADFSSTVI PTSTNEKPET DNWDAWAAQP SLTVPSAGQL 
201 RQRSAFTPAT ATGSSPSPVL GQGEKVEGLQ AQALYPWRAK 
241 KDNHLNFNKN DVITVLEQQD MWWFGEVQGQ KGWFPKSYVK 
281 LISGPIRKST SMDSGSSESP ASLKRVASPA AKPVVSGEEI 
321 AQVIASYTAT GPEQLTLAPG QLILIRKKNP 6GWWEGELQA 
361 RGKKRQIGWF PANYVKLLSP GTSKITPTEP PKSTALAAVC 
401 QVIGMYDYTA QNDDELAFNK GQIINVLNKE DPDWWKGEVN 
441 GQVGLFPSNY VKLTTDMDPS QQ 

FIG. 51 



1 GAATTCGCGG CCGCGTCGAC CAAGATCATT CCTGGGAGTG 
41 AAGTAAAACG 6GAAGAACCA GAAGCTHGT AT6CAGCTGT 
81 AAATAAGAAA CCTACCTCGG CAGCCTATTC AGTTGGAGAA 
121 GAATATATTG CACTHATCC ATATTCAAGT GT66AACCTG 
161 GAGATTTGAC TTTCACAGAA GGTGAAGAAA TATTGGTGAC 
201 CCAGAAAGAT GGAGAGTGGT GGACAGGAAG TATTGGAGAT 
241 AGAAGTGGAA TTHTCCATC AAACTATGTC AAACCAAAGG 
281 ATCAAGAGAG TTTTGGGAGT GCTAGCAAGT CTGGAGCATC 
321 AAATAAAAAA CCTGAGAHG CTCAG6TAAC TTCAGCATAT 
361 GHGCTTCTG GHCTGAACA ACnAGCCTT GCACCAGGAC 
401 AGTTAATAn AATTCTAAAG AAAAATACAA GTGGGTGGTG 
441 GCAAGGAGAG TTACAGGCCA GAGGAAAAAA GCGACAGAAA 
481 GGATGGHTC CTGCCAGTCA TGTTAAACTT TTGGGTCCAA 
521 GCAGTGAAAG AGCCACACCT GCCTTTCATC CTGTATGTCA 
561 GGTGATTGCT ATGTATGACT ATGCAGCAAA TAATGAAGAT 
601 GAGCTCAGTT TCTCCAAGGG ACAACTCATT AATGHATGA 
641 ACAAAGATGA TCCTGATTGG TGGCAAGGAG AGATCAACGG 
681 GGTGACTGGT CTCTHCCTT CAAACTACGT TAAGATGACG 
721 ACAGACTCAG ATCCAAGTCA ACAGTGA 
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1 EFAAASTKII PGSEVKREEP EALYMVNKK PTSAAYSVGE 
41 EYIALYPYSS VEPGDLTFTE GEEILVTQKD GEWWTGSIGD 
81 RSGIFPSNYV KPKDQESFGS ASKSGASNKK PEIAQVTSAY 
121 VAS6SEQLSL APGQLILILK KNTSGWWQGE LQARGKKRQK 
161 6WFPASHVKL LGPSSERATP AFHPVCQVIA MYDYAANNED 
201 ELSFSKGQLI NVMNKDDPDW WQGEINGVTG LFPSNYVKMT 
241 TDSDPSQQ 
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HSLHLHRHQGRKERARYDLEAAQDNELTFKAGEIMTVLDDSDPNWWKGETHQGIGLFPSN 60 
FVTADLTAEPEMIKTEKKTVQFSDDVQVETIEPEPEPAFIDEOKMDQLLQMLQSTDPSDD 120 
QPDLPELLHLEAMCHQMGPLIDEKLEDIDRKHSELSELNVKVMEALSLYTKLMNEDPMYS 180 
MYAKLQNQPYYMQSSGVSGSQVYAGPPPSGAYLVAGNAQMSHLQSYSLPPEQLSSLSQAV 240 
VPPSANPALPSQQTQAAYPNRSPGDLMKPGDSECRGSAEDSQMRISPPYFPTGQQA 296 

FIG. 55 



IRGRVDQ6EWPLPGRGTPGPSGLCVPEDQCRVRDLKGWLDSFWAKAEKEE 50 
ENRRLEEKRWAEEAQRQLEQERRERELREAARREQRYQEQGGEASPQ5RT 100 
WEQQQEVVSRNRNEQESAVHPREIFKQKERAMSTTSISSPQPGKLRSPFL 150 
QKQLTQPETHFGREPAAAISRPRADLPAEEPAPSTPPCLVQAEEEAVYEE 200 
PPEQETFYEQPPLVQQQGAGSEHIDHHIQGQGLSGQGLCARALYDYQAAD 250 
DTEISFDPENLITGIEVIDEGWWRGYGPDGHFGMFPANYVELIDEAEGTS 300 
CPSPLRHGFL I AGRGGLGVDIQHSSRNRTPSEDEASGLPPAWQTQPVTPN 350 
AAMAW 355 
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GRVDIERKRLELMQKKKLEDEAARKAKQGKENLWKENLRKEEEEKQKRLQEEKTQEKIQE 60 
EERKAEEKQRETASVLVNYRALYPFEARNHDEMSFNSGDI IQVDEKTVGEPGWLYGSFQG 120 
NFGWFPCNYVEKMPSSENEKAVSPKKALLPPTVSLSATSTSSEPLSSNQPASVTDYQNVS 180 
FSNLTVNTSWQKKSAFTRTVSPGSVSPIHGQGQVVENLKAQALCSWTAKKDNHLNFSKHD 240 
I ITVLEQQENWWFGEVHGGRGWFPKSYVKIIPGSEVKREEPEALYAAVNKKPTSAAYSVG 300 
EEYIALYPYSSVEPGDLTFTEGEEILVTQKDGEWWTGSIGDRSGIFPSNYVKPKDQESFG 360 
SASKSGASNKKPEIAQVTSAYVASGSEQLSLAPGQLILILKKNTSGWWQGELQARGKKRQ 420 
KGWFPASHVKLLGPSSERATPAFHPVCQVIAMYDYAANNEDELSFSKGQLINVMNKDDPD 480 
WWQGEINGVTGLFPSNYVKMTTDSDPSQQ 509 
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CACTCTCTACACTT6CACCGGCATCAAGGACGAAAA6AAC 40 
GCGCTAGATATGACTTGGAAGCTGCTCAAGACAATGAACT 80 
TACTTTCAAAGCTGGAGAAATTATGACAGTTCTTGAT6AC 120 
AGT6ATCCTAACTGGTGGAAAGGTGAAACCCATCAAGGCA 160 
TAGGGnATTTCCnCTAATnTGTGACTGCAGATCTCAC 200 
TGCTGAACCAGAAATGAHAAAACAGAGAAGAAGACGGTA 240 
CAATTTAGTGATGATGTTCAGGTAGAGACAATAGAACCAG 280 
AGCC66AACCAGCCTTTATTGAT6AAGATAAAATGGACCA 320 
GnGCTACAGATGCTGCAAAGTACAGACCCCAGTGATGAT 360 
CAGCCA6ACCTACCAGAGCT6CTTCATCTTGAAGCAATGT 400 
GTCACCAGATGGGACCTCTCATTGATGAAAAGCTGGAAGA 440 
TATTGATAGAAAACATTCAGAACTCTCAGAACTTAATGTG 480 
AAAGTGATGGAGGCCCTTTCCnATATACCAAGTTAATGA 520 
ACGAAGATCCGATGTAnCCATGTATGCAAAGnACAGAA 560 
TCAGCCATAnATATGCAGTCATCTGGTGTTTCTGGTTCT 600 
CAGGTGTATGCAGGGCCTCCTCCAAGTGGTGCCTACCTGG 640 
TTGCAGGGAAC6CGCA6ATGAGCCACCTCCAGAGCTACAG 680 
TCTTCCCCCGGAGCAGCT6TCTTCTCTCAGCCAGGCAGTG 720 
GTCCCACCATCCGCAAACCCAGCCCnCCTAGTCAGCAGA 760 
CTCAQGCC6CTTACCCAAACC6CTCCCCAGGGGACCTCAT 800 
GAAGCCCGGTGAnCTGAATGCCGTGGATCTGCCGAGGAT 840 
TCCCAGATGCGTATrrCTCCTCCGTACTTCCCCACAGGAC 880 
AGCAGGCnGAATAGCTGATTGCCTATGCAGGACAACAGG 920 
CTTGAATAGCTGACTGCCTATGCATTCTCTTTGCTTGCCA 960 
GTTTTTTGGACATCAAACTTGACAGATCCAAGATTATTAC 1000 
TTTGATCTTCCCCACACCCCTCCCACCCCCGAGTCTACTA 1040 
TGGTCCCATCATAGTATTCTGAAAATCAGTGAATGGCCAC 1080 
TCTACCAGTTATTTCTACCAGTmTAGGTTCTAAACCTC 1120 
AGGCAnCTGGACTCTTCTGnCATTATCATATTTTGAAG 1 160 
GCATTATCTTCAAAATCTATCTAGACTCTGACCCTTTCTC 1200 
CCATCTCCACCAnACTGCCGTGGCTCnCTGCTGGTCGG 1240 
CTCTCTCCTGGTGGATCCGTAATAACCTGCAGTCAGCTAT 1280 
CCTGGTCCAGAAGGGAACCCCGnAAACCCTGTTGGAATC 1320 
TTATCACGCnCTGCTCCAGAACGAACCCAGTCTGTCTGT 1360 
CTCACTCAGAGTGTAA6CTACAGTCCTTATTGTGGCCATC 1400 
AGGTGCTGTGTGTTCTCCAGCCCCCTCCCCACCACCGCAG 1440 
TCCTGCCG6TGATCTTAGCTGCTCTCCCCTCGGAACCCCC 1480 
TGCG6CCCCCTCTGCCGCAACAXTCGTG6CCTGCTGTTCC 1520 
nGAACATGCTTGGTGnrrCTCTCCTCAAAGGCncnT 1560 
CTGITTACCTGAAATGTACTTGCCTAGGGAAATCTTATCC 1600 
TGGCTCACTCCGCTTACTmTTCCACATCTTTGCTTAAA 1640 
GnAnGCCCnATTGGAGAAGGCACCCCTACCATAAACT 1 680 
AGAAATCCCTTGCCCCCAAGCTGCTCCTTT 1710 
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GAATTCGCGGCC6CGTCGACCAAGGAGAGTGGCCGCTTCC 40 
AGGACGTGGGACCCCAGGCCCCAGTGGGCTCTGTGTACCA 80 
GAAGACCAATGCCGTGTCAGAGAniAAAGGGTTGGTTAG 120 
ACAGCTTCTGGGCCAAAGCAGAGAAGGAGGAGGAGAACCG 160 
TCGGCTGGAGGAAAAGCGGTGGGCCGAGGAGGCACAGCGG 200 
CAGCTGGAGCAGGAGCGCCGGGAGCGTGAGCTGCGTGAGG 240 
CTGCACGCCGGGAGCA6CGCTATCAGGA6CAGGGTGGCGA 280 
GGCCAGCCCCCAGAGCAGGACGTGGGAGCAGCAGCAAGAA 320 
GTGGnTCAAGGAACCGAAATGAGCAGGAGTCTGCCGTGC 360 
ACCCGAGGGAGAnnCAAGCAGAAGGAGAGGGCCATGTC 400 
CACCACCTCCATCTCCAGTCCTCAGCCTGGCAAGCTGAGG 440 
AGCCCCTTCCTGCAGAAGCAGCTCACCCAACCAGAGACCC 480 
ACTnGGCAGAGAGCCAGCTGCTGCCATCTCAAGGCCCAG 520 
GGCAGATCTCCCTGCTGAG6AGCCG6CGCCCAGCACTCCT 560 
CCATGTCTGGTGCAGGCAGAAGAGGAGGCTGTGTATGAGG 600 
AACCTCCAGAGCAGGAGACCnCTACGAGCAGCCCCCACT 640 
GGTGCAGCAGCAAGGTGCTGGCTCTGA6CACATTGACCAC 680 
CACAnCAGGGCCAGGGGCTCAGTGGGCAAGGGCTCTGTG 720 
CCCGTGCCCTGTACGACTACCA6GCAGCCGACGACACAGA 760 
GATCTCCTTTGACCCCGAGAACCTCATCACGGGCATCGAG 800 
6TGATCGACGAA6GCTG6TGGCGT6GCTAT6GGCCGGATG 840 
GCCATTTTGGCATGnCCCTGCCAATTACGTGGAGCTCAT 880 
TGATGAGGCTGAGGGCACATCnGCCCTTCCCCTCTCAGA 920 
CATGGCnCCnATTGCTGGAAGAGGAGGCCTGGGAGTTG 960 
ACAnCAGCACTCnCCAGGAATAGGACCCCCAGTGAGGA 1000 
TGAGGCCTCAGGGCTCCCTCCGGCTTGGCAGACTCAGCCT 1040 
GTCACCCCAAATGCAGCAATGGCCTGGTGATTCCCACACA 1080 
TCCnCCTGCATCCCCCGACCCTCCCAGACAGCTTGGCTC 1 120 
TTGCCCCTGACAGGATACT6AGCCAAGCCCTGCCTGTGGC 1 160 
CAAGCCCTGAGTGGCCACTGCCAAGCTGCGGGGAAGGGTC 1200 
CTGAGCAGGGGCATCTQGGAQGCTCTGGCTGCCTTCTGCA 1240 
TTTAnTGCCnTTTTCTTTnCTCnGCTTCTAAGGGGT 1280 
GGTGGCCACCACTGinAGAATGACCCTTGGGAACAGTGA 1320 
ACGTAGAGAATTGTmTAGCAGAGTnGTGACCAAAGTC 1360 
AGAGTGGATCATGGTGGTTTGGCAGCAGGGAATnGTCTT 1400 
6TTGGAGCCTGCTCTGTGCTCCCCACTCCATTTCTCTGTC 1440 
CCTCTGCCTGGGCTATGGGAAGTGGGGATGCAGATGGCCA 1480 
AGCTCCCACCCTGGGTAnCAAAAACGGCAGACACAACAT 1 520 
GTTCCTCCACGCGGCTCACTCGATGCCTGCAGGCCCCAGT 1560 
GTGTGCCTCAACTGATTCTGACnCAGGAAAAGTAACACA 1600 
GAGTGGCCTTGGCCTGnGTCnCCCCTATlTTCTGTCCC 1640 
AGCTCATCCGTGGTCGAAGCGCCCGCGAATTCCAGCTGAG 1680 
CGGCCGC 1687 
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GCGGCCGCGTCGACATTGAAAGGAAAAGAnAGAACTAAT 40 
GCAGAAAAAGAAACTAGAAGATGAGGCTGCAAGGAAAGCA 80 
AAGCAAGGAAAAGAAAACnATGGAAAGAAAATCTTAGAA 120 
AGGAGGAAGAAGAAAAACAAAAGCGACTCCAGGAAGAAAA 160 
AACACAAGAAAAAATTCAAGAAGAGGAACGGAAAGCTGAG 200 
GAGAAACAACGTGAGACAGCTAGTGnnGGTGAAHATA 240 
GAGCAnATACCCCTTTGAAGCAAGGAACCATGATGAGAT 280 
GAGTTTTAAnCTGGAGATATAATTCAGGTTGATGAAAAA 320 
ACCGTAGGAGAACCTGGTTGGCTTTATGGTAGTniCAAG 360 
GAAATTTTGGCTGGnTCCATGCAAnATGTAGAAAAAAT 400 
GCCATCAAGTGAAAATGAAAAAGCTGTATCTCCAAAGAAG 440 
GCCTTACTTCCTCCTACAGTTTCTTTATCTGCTACCTCAA 480 
CTTCCTCTGAACCACTTTCTTCAAATCAACCAGCATCAGT 520 
GACTGATTATCAAAATGTATCniTTCAAACCTAACTGTA 560 
AATACATCATGGCAGAAAAAATCAGCCTTCACTCGAACTG 600 
TGTCCCCTGGATCTGTATCACCTATTCATGGACAGGGACA 640 
AGTGGTAGAAAACnAAAAGCACAGGCCCniGnCCTGG 680 
ACTGCAAAGAAAGATAACCACHGAACnCTCAAAACATG 720 
ACATTATTACTGTCnGGAGCAGCAAGAAAAnGGTGGn 760 
TGGGGAGGTGCATGGAGGAAGAGGATGGTTTCCCAAATCT 800 
TATGTCAAGATCAnCCTGGGAGTGAAGTAAAACGGGAAG 840 
AACCAGAAGCTnGTATGCAGCTGTAAATAAGAAACCTAC 880 
CTCGGCAGCCTAnCAGTTGGAGAAGAATATAnGCACTT 920 
TATCCATATTCAAGTGTGGAACCTGGAGAnTGACTTTCA 960 
CAGAAGGTGAAGAAATATTGGTGACCCAGAAAGATGGAGA 1000 
GTGGTGGACAGGAAGTATTGGAGATAGAAGTGGAATTTTT 1040 
CCATCAAACTATGTCAAACCAAAGGATCAAGAGAGTTTTG 1080 
GGAGTGCTAGCAAGTCTGGAGCATCAAATAAAAAACCTGA 1120 
GAnGCTCAGGTAACTTCAGCATATGnGCTTCTGGTTCT 1 160 
GAACAACnAGCCnGCACCAGGACAGTTAATATTAATTC 1200 
TAAAGAAAAATACAAGTGGGTGGTGGCAAGGAGAGTTACA 1240 
GGCCAGAGGAAAAAAGCGACAGAAAGGATGGTTTCCTGCC 1280 
AGTCATGnAAACTniGGGTCCAAGTAGTGAAAGAGCCA 1320 
CACCTGCCTTTCATCCTGTATGTCAGGTGATTGCTATGTA 1360 
TGACTATGCAGCAAATAATGAAGATGAGCTCAGITTCTCC 1400 
AAGGGACAACTCATTAATGTTATGAACAAAGATGATCCTG 1440 
AnGGTGGCAAGGAGAGATCAACGGGGTGACTGGTCTCTT 1480 
TCCnCAAACTACGnAAGATGACGACAGACTCAGATCCA 1520 
AGTCAACAGTGACCCAATGTTGTCnCCAGnGTGAAAGC 1560 
ACCCCAGAGACCCACTATCCAAGniCACTCTAGCGTGGA 1600 
GGCAGGGCAGGCAGCCCTGATCAAATATCTGCTACACAAT 1 640 
TCGTTTACTTCGTTTGAATGTTAGAGCCACTTGTGAnAT 1680 
nTrTTGTGTnCTAACnACAGTTTAAATTTAnTGTAA 1720 
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AAAGTTAAAGGATAGTGGGTCTTTGTGTGGCTnCCCTGC 1760 
TGTTCACTCTGGCATCTnAGCATTnTCnCTTTTrTAA 1800 
TTTGATAAnGTAGGTCAnAGCATGCATAnGAGTnGC 1840 
CCTTATGT6GTGGGAGTTCAAACACACAAAGACCCACTAT 1880 
TTGC ACAAACTATTCTTACTGGTnGGAATAGGCTGCCAT 1 920 
GCTTnrrrAATGTTATTGCAACATGTGTATTCATTTACAG 1960 
AAnCAGATAAAATTTGCTTATGnCTGCTAnATGTTTG 2000 
ATCTAATCCTAATCACAGTGAGCTCTTAAnAGCTCAATA 2040 
TGTGGTTTGCCCTCAAGTGTGCACTGnTAnACnTGTA 2080 
ATATGCCACTATGAGTACTGACATTTAGATATGnTAAAG 2120 
GCCAAGAACTGGAAACAGCCATGCCCTGTITTCTGTGTAT 2160 
TTGGGATGGGAATAACAACATTTTGGGG6GAGCTTTTTAA 2200 
ATCTCAGAGAAGAGGAAAGTGGCCTGCTCTGGCAGGTATG 2240 
TGCAGTGnTCATnGTTCCAGTCCCAAGAATGAGCACTG 2280 
TCCTATGGTAGnCGCTTAGGATCTTTATGTGCTCTGGGC 2320 
TAATGAAGGTACTGCATCATGTGCTGCAGCGTGTGTATTC 2360 
TnnCGATGACCTATAAAGGGAnATTnTGAGGAATGA 2400 
AAGGCTCCCATCAnGACTGTGAGATGGGAAAAACCrrrC 2440 
CTAGCnAGAGCATTTATATCnAATCCATnTAAAGTCA 2480 
GAGTTCAnGnACCTGTTTTAATCAGGTGACTACATGTC 2520 
CCAGTATACAAAGGGGCACTGGnGACAnCTTCTTAATG 2560 
TATTTAGTAAATATCATAAGAAATCCmAAGAGnTAAA 2600 
TGTCCCCAAAACAGACATGCGGGCTCTAGTCAAGAATGAA 2640 
TTAGAGTGAAGGAAAGCTGTGTAACACCTGGCAnCCTCT 2680 
GTGnCATGGAGCTTCTTTGAGGCTCTAAGATTGATTnA 2720 
CCATCAGACnCTCTAATACCTGTTCTTCAACCATATTGG 2760 
CTACTrTGACATAAGAAnTACnCTrrrCCTGGAATGGA 2800 
AAACACTnAAAAAATAATAACAAACATTATTATAAACTA 2840 
ATATATGTGAGAGGTCGACGCGGCCGCGAATTC 2873 
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GAAnCGTCGACCCACGCGTCCGAAATATAACTGAAGnGGGGCACCTAC 50 
TGAAGAAGAGGAAGAAAGTGAAAGTGAAGATAGTGAAGACAGTGGTGGGG 100 
AGGAAGAAGATGCAGAGGAGGAAGAGGAAGAGAAAGAGGAAAATGAATCT 150 
CACAAATGGTCAACCGGTGAAGAATACATCGCTGTTGGAGAinTACTGC 200 
TCAGCAAGnGGAGATCTTACATTTAAGAAAGGGGAAAnCTCCnGTAA 250 
nGAAAAAAAACCTGATGGTTGGTGGATAGCTAAGGATGCCAAAGGAAAT 300 
GAAGGTCnGnCCCAGAACCTACCTAGAGCCTTATAGTGAAGAAGAAGA 350 
AGGCCAAGAGTCAAGTGAAGA6G6CAGTGAAGAAGATGTAGAGGCGGTGG 400 
ATGAAACAGCAGATGGAGCAGAAGTTAAGCAAAGAACTGATCCCCACTGG 450 
AGTGCTGnCAGAAAGCGATTTCAGAGGCGGGCATCTTCTGTCnGnAA 500 
TCATGICTCGTITTGCTACCTAATAGnCTGATCCGTCCCTAA 543 
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GAATTCGGCGGACnCGCGGCCGCGTCGACGAAGAAACCT 40 
GAAGGACACACTAGGCCTCGGCAAGACGCGCAGGAAGACC 80 
AGCGCGCGGGATGCGTCCCCCACGCCCAGCAC6GACGCCG 120 
AGTACCCCGCCAATGGCAGCGGCGCCGACCGCATCTACGA 160 
CCTCAACATCCCGGCCnCGTCAAGnCGCCTATGTGGCC 200 
GAGCGGGAGGATGAGnGTCCCTGGTGAAGGGGTCGCGCG 240 
TCACCGTCATGGAGAAGTGCAGCGACGGnGGTGGCGGGG 280 
CAGCTACAACGGGCAGATCGGCTGGnCCCCTCCAACTAC 320 
GTCTTGGAGGAGGTGGACGAGGCGGnGCGGAGTCCCCAA 360 
GCnCCTGAGCCTGCGCAAGGGCGCCTCGCTGAGCAATGG 400 
CCAGGGCTCCCGCGTGCTGCATGTGGTCCAGACGCTGTAC 440 
CCCnCAGCTCAGTCACCGAGGAGGAGCTCAACTTCGAGA 480 
AGGGGGAGACCATGGAGGTGAnGAGAAGCCGGAGAACGA 520 
CCCCGAGTGGTGGAAAT6CAAAAATGCCCGGGGCCAGGTG 560 
GGCCTC6TCCCCAAAAACTACGTGGTGGTCCTCAGTGACG 600 
GGCCTGCCCTGCACCCTGCGCACGCCCCACAGATAAGCTA 640 
CACCGGGCCCTCGTCCAGCGGGCGCTTCGCGGGCAGAGAG 680 
TGGTACTACGGGAACGTGACGCGGCACCAGGCCGAGTGCG 720 
CCCTCAACGAGCGGGGCGTGGAGGGCGACTTCCTCAnAG 760 
GGACAGCGAGTCCTCGCCCAGCGACTTCTCCGTGTCCCn 800 
AAAGCGTCAGGGAAGAACAAACACnCAAGGTGCAGCTCG 840 
TGGACAATGTCTACTGCATTGGGCAGCGGCGCnCCACAC 880 
CATGGACGAGCTGGTGGAACACTACAAAAAGGCGCCCATC 920 
nCACCAGCGAGCACGGGGAGAAGCTCTACCTCGTCAGGG 960 
CCCTGCAGTGA 971 
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because they are dqxndcnt claims and arc not dimAed in aoooidanoe with the teoond and third aenccnocs of Rule 6.4(a). 



Box II Obser?atkms where unity of Inteotioa b laclilng (Continuation of item 2 of first sheet) 



Hiis International Searching Authority found multiple inventions in this inteniational application, as follows: 
Please See Extra Sheet. 



1^ As all required additional search fees were timely paid by the applicant, this international search report coven all searchable 



claims 



2. As aU searchable claims could be searched without effoitjuatifying an additional fee, this Au^ 
of any additional fee. 

3. Aa only tome of the required additional aearch fees wew timely paid by the applicam, this international search report covers 
only those chums for which fees were paid, specifically daims Nos.: 



4. No required additional search fees were timely paid by the applicant. Consequently, this international search report is 

restricted to the invention first mentioned in the claima; it is covered by claims Nos.: 



Rcoiark on Protest The additional aearch fees were accompanied by the applicant's protest. 

I I No protest acoompanied the payment of additional search leea. 
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INTERNATIONAL SEARCH REPORT 



IntenutMNitl application No. 
PCTAIS96/044$4 



A. CLASSinCATION OF SUBIECT MATTER: 
USCL : 

435/6, 7.1. 7.5, 172.1. 240.1, 320.1; 530^00, 350. 387.9; 53603.5 

B. FIELDS SEARCHED 

Electronic daU buei consulted (Name of dau base and where ptacticable tenns used): 



AFS, STN. DIALOG w . u ^ 

Mfch terms: librmry. geae exproasion, peptide, avidin, biotin, multiple amigeo peptide, phage dupUy. antibody, SH3, 

SH2. zinc fuiger, leucine zipper 

BOX U. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACXINQ 
Thu ISA found muhiple inventions as follows: 

This application contains the following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for all inventions to be examined, the appropriate additional 
cximinstion fees must be paid. ^ . . 

Group I, claim(s) 1-52, 69-73. 89, 90 and 94^7 drawn to methods of idcniilying a polypeptide compnsmg a functional 

domain of intercat. 

Group U, cUim(s) 53-68. 74, 75. 79. 80 and 101-102, drawn to a purified polypeptide, kiu conHtning said purified 
polypeptide and methods of seroening for a potcotiil dnig candidate. 

Group in. claim(s) 76-78. 81-88 and 100 drawn to DNA encoding • polypeptide, a vector comprising said DNA, a 
rocombinant oeU and methods of producing a fusion protein. 

Group IV, claim(s) 91-93, drawn to a method of determining the potential pharmacofogical activities of a molecule. 
Group V, clum(8) 98 and 99, drawn to an antibody. 

The inventions listed as Gfoupal-V do not rotate to a singk inventive 000^ because, under 

PCT Rule 13.2. they lack the same or eorresponding speeial technical featuiea for the following reasons: 

The invention of Group 1 U drawn to a method of identifying a polypeptide comprising a functional domain of interest, 
and at claimed, docs not require the productt of Groups U. UI and V. TTie polypeptide and kiU of Group U have a 
defined seq ID. which are not required in the method of Group I. In addition, functional domains suchas SH3 domains 
aro^rin the an (see for E^ple. Cheadie et al., J, Biol. Chcm. Vol. 269. No. 39. pages 24034^24039 (1994)). 
Further, the melhod of Group IV also lacks the technical feature of Group I as Group IV does not require the use of a 

multivalent recognition unit complex. „ ,„ ^ ... ^ r 

The invenUon of Group IV also does not have the same technical features as Groups U. ID and V. as the method of 
Group IV as claimed does not nquire the productt of Groupa U, ID end V, The polypeptide and kitt of Group U have a 
defined seq. ID . which are not required in the method of Group IV. In additfon. as sttted above, fonctional domains 
such as SH3 domains are known in the ait 

Groups II and lU also lack a s'mgle concept. Group fl U drawn to polypeptide and Group 01 U drawn to DNA. and thus 
have difTcienl structure and function. In addition, as sttled above, polypeptides comprising functional domains such as 
SH3 domains are known in the art. Group V also does not relate to a single inventive concept, as Group V u drawn to 
an antibody, and is not required by the method of Croups I or IV, and is a separate product than the productt of Groups 
U and UI. having a different function and stnicture. 
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